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Abstract 

We define the Ricci curvature of Markov chains on metric spaces as 
a local contraction coefficient of the random walk acting on the space 
of probability measures equipped with a Wasserstein transportation 
distance. For Brownian motion on a Riemannian manifold this gives 
back the value of Ricci curvature of a tangent vector. Examples of 
positively curved spaces for this definition include the discrete cube 
and discrete versions of the Ornstein-Uhlenbeck process. Moreover 
this generalization is consistent with the Bakry-Emery Ricci curvature 
for Brownian motion with a drift on a Riemannian manifold. 

Positive Ricci curvature is shown to imply a spectral gap, a Levy- 
Gromov-like Gaussian concentration theorem and a kind of modified 
logarithmic Sobolev inequality. The bounds obtained are sharp in 
several interesting examples. 

Introduction 

There are numerous generalizations of the notion of a metric space with 
negative sectional curvature: manifolds with negative sectional curvature, 
CAT(O) and CAT(— 1) spaces or 5-hyperbolic spaces are widely used in var- 
ious branches of mathematics and give rise to numerous theorems. For posi- 
tive curvature in Riemannian geometry, the right concept seems to be a lower 
bound on Ricci curvature (which is weaker than a lower bound on sectional 
curvature). The most basic result in this direction is the Bonnet-Myers the- 
orem bounding the diameter of the space in function of the Ricci curvature, 
but let us mention Lichnerowicz' theorem for the spectral gap of the Lapla- 
cian (Theorem 181 in |Ber03| V the Levy-G romov theorem for isoperimetric 
inequalities and concentration of measure |Gro86| . or Gromov's theorem on 
precompactness of the space of manifolds with given dimension, upper bound 
on the diameter and lower bound on the Ricci curvature. 
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We refer to the nice survey [Lottj for a discussion of the geometric interest 
of lower bounds on Ricci curvature, with further references, and the need 
for a generalized notion of positive Ricci curvature for metric spaces (often 
equipped with a measure). 

There have been several generalizations of the notion of Ricci curvature. 
First, the study by Bakry and Emery |BE85| of hypercontractivity of diffusion 
processes led them to show that, when considering the Brownian motion on 
a manifold with an additional drift given by a tangent vector field F, the 
quantity Ric — 2V sym F plays the role of a Ricci curvature for the process, 
as far as functional inequalities are concerned. The main example is the 
Ornstein-Uhlenbeck process on M. N , whose invariant distribution is Gaussian, 
and which is positively curved in this sense. 

Later, simultaneously, Sturm |Stu06| . Lott and Villani |LV| . and Ohta |Oht| 
used ideas from optimal transportation theory to define a notion of lower 
bound on the Ricci curvature for length spaces equipped with a measure. 
Their definition keeps a lot of the properties traditionally associated with 
positive Ricci curvature, and is compatible with the Bakry-Emery extension. 
However, it has two main drawbacks. First, it is infinitesimal, and in partic- 
ular is meaningless for a graph. Second, the definition is rather involved and 
difficult to check on concrete examples. The main class of spaces for which 
this definition is interesting are Gromov-Hausdorff limits of manifolds of a 
given dimension. 

Here we propose a definition of Ricci curvature for metric spaces equipped 
with a Markov chain or a diffusion process (which for a Riemannian manifold 
will typically be Brownian motion), which is hopefully simpler to check on 
examples. The definition is again based on optimal transportation, but in a 
less infinitesimal way, and can be used to define a notion of "curvature at a 
given scale" for a metric space. As a consequence, we can test it in discrete 
spaces such as graphs. Such an example is the discrete cube {0, 1}^, which 
from the point of view of concentration of measure behaves very much like 
the sphere S N , and is thus expected to somehow have positive curvature. 

Our definition, when applied to a Riemannian manifold equipped with the 
Brownian motion, gives back the usual value of the Ricci curvature of a tan- 
gent vector. It is consistent with the Bakry-Emery extension, and provides a 
visual explanation for the contribution — V sym F of the drift F. We are able 
to prove generalizations of the Bonnet-Myers theorem, of the Lichnerowicz 
spectral gap theorem and of the Levy-Gromov isoperimetry theorem, as well 
as a kind of modified logarithmic Sobolev inequality, although with some 
(bounded) loss in the constants. As a by-product, we get a new proof for 
Gaussian concentration and the logarithmic Sobolev inequality in the Levy- 
Gromov or Bakry-Emery context (though the constants are not sharp). 
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Related work. After having written a first version of this text, we learned 
that related ideas appear in several recent papers. Joulin |Jou| uses contrac- 
tion of the Lipschitz constant (under the name "Wasserstein curvature") to get 
a Poisson-type concentration result for continuous-time Markov chains on a 
countable space, at least in the bounded, one-dimensional case. Oliveira [OliJ 
proves that Kac's random walk on SO(n) has positive Ricci curvature in our 
sense, which allows to improve mixing time estimates significantly. Djellout, 
Guillin and Wu |DGW04) use contraction of Lipschitz constants and trans- 
portation distances (without the link with Ricci curvature) in the context 
of dependent sequences of random variables, to get Gaussian concentration 
results. The link with the spectral gap appears in |Sam| (p. 94) for the partic- 
ular case of graphs, and is present in the works of Chen (e.g. |CW97llChe98| ). 

From the discrete Markov chain point of view, the techniques presented 
here are just a metric version of the usual coupling method. Namely, Ricci 
curvature can be seen as a refined version of Dobrushin's ergodic coefficient 
(see |Dob56| . or e.g. section 6.7.1 in |Bre99| ) using the metric structure on 
the underlying space. 

From the Riemannian point of view, our approach boils down to contrac- 
tion of the Lipschitz norm by the heat equation, which is one of the results 
of Bakry and Emery f [BE8l IBE85] . see also [ABCFGMRSOfl] and (RS05] L 
This latter property was suggested in |RS05| as a possible definition of a lower 
bound on Ricci curvature for diffusion operators in general spaces, though it 
does not provide an explicit value for Ricci curvature at a given point. 
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Notation. In the paper, we use the symbol ~ to denote equality up to a 
multiplicative universal constant (typically 2 or 4); the symbol ~ denotes 
usual asymptotic equivalence. The word "distribution" is used as a synonym 
for "probability measure". 
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1 Definitions and statements 



1.1 Ricci curvature 

A common framework for generalizations of Ricci curvature is that of metric 
measure spaces |Stu06[lLV] . However, most measures appear as the invariant 
distribution of some process (e.g. Brownian motion on a Riemannian mani- 
fold), and it is more convenient and more general to start with a process in 
a metric space, as is the case in Bakry-Emery theory. See also Remark [5] 
below. 

Here for simplicity we will mainly consider the case of a discrete-time 
process. Similar definitions and results can be given for continuous time (see 
e.g. Section [333). 

Definition 1 - Let (X, d) be a Polish metric space, equipped with its 
Borel a-algebra. 

A random walk m on X is a family of probability measures m x (-) on 
X for each x G X, satisfying the following two technical assumptions: (i) 
the measure m x depends measurably on the point x G X; (ii) each mea- 
sure m x has finite first moment, i.e. for some (hence any) o G X one has 
J d(o,y)dm x (y) < oo. 

This defines a Markov chain whose transition probability from x to y in 
n steps is 

dml n (y):= [ dmf" 1 ^) dm z (y) 

where of course m* x := m x . 

Recall that a measure v on X is invariant for this random walk if du(x) = 
§ y dv(y)dm y (x). It is reversible if moreover, the detailed balance condition 
dv(x)dm x (y) = dv(y)dm y (x) holds. 

This allows to define a notion of curvature as follows. Consider two very 
close points x, y in a Riemannian manifold, defining a tangent vector (xy). 
Let w be another tangent vector at x; let w' be the tangent vector at y 
obtained by parallel transport of w from x to y. Now if we follow the two 
geodesies issuing from x, w and y, w', in positive curvature the geodesies will 
get closer, and will part away in negative curvature. Ricci curvature along 
(xy) is this phenomenon, averaged on all directions w at x. 

So in the general case, we will measure whether following the random 
walk issuing from two nearby points x, y results in points that are closer 
than x, y were, in which case Ricci curvature will be positive, or further 
apart, in which case Ricci curvature will be negative. This is made precise 
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by the use of transportation distances between probability measures. We 
refer to |Vil03| for an introduction to this topic. 

Definition 2 - Let (X,d) be a metric space and let v\,v 2 be two prob- 
ability measures on X. The L 1 transportation distance between v 1 and v 2 
is 

T 1 {v 1 ,v 2 ):^ inf / d(x, y) d£(x, y) 

een(i/i,«/ 2 ) J( X) y)exxx 

where U.(ui, z/ 2 ) is the set of measures on X x X projecting to v\ and v 2 . 

Intuitively, d£(x, y) represents the mass that is sent from x to y, hence 
the constraint on the projections of £, ensuring that the initial measure is V\ 
and the final measure is v 2 . 

The infimum is actually attained (Theorem 1.3 in |Vil03| ). but the optimal 
coupling is generally not unique. In what follows, it is enough to chose one 
such coupling. 

Definition 3 - Let (X, d) be a metric space with a random walk m. Let 
x,y G X be two distinct points. The Ricci curvature of (X, d, m) in the 
direction (x, y) is 




x y 



When (X, d) is a Riemannian manifold, if the random walk consists in 
randomly jumping in a ball of radius e around x, for small e and close enough 
x, y this definition captures the Ricci curvature in the direction xy (up to 
some factor depending on e). 
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We will see below (Proposition [19]) that in geodesic spaces, it is enough 
to know n(x,y) for close points x, y. 

If a continuous-time Markov kernel is given, one can also define a continuous- 
time version of the Ricci curvature by setting 

d Ti(m*,m*) 
K W- dt d(x,y) 

when this derivative exists, but for simplicity we will mainly work with the 
discrete-time version here. Indeed, for continuous-time Markov chains, exis- 
tence of the process is already a non-trivial issue. We will sometimes use our 
results on concrete continuous-time examples (e.g. M/M/oo queues in sec- 
tion GOO]) , but only when they appear as an obvious limit of a discrete-time 
approximation. 

One could use the LP transportation distance instead of the L 1 one in the 
definition; however, though this will result in stronger assumptions, I did not 
find any theorem where this would be necessary. 

Notation - By analogy with the Riemannian case, when computing the 
transportation distance between measures m x and m y , we will think of X x X 
equipped with the coupling measure as a tangent space, and for z G X x X 
we will write x + z and y + z for the two projections to X. So in this notation 
we have 

k(x, y) = — —, r / (d{x + z,y + z) — d(x, y)) dz 



d(x,y) 

where implicitly dz is the optimal coupling between m x and m y . 



1.2 Examples 

Example 4 (Z N and R n ) - Let m be the simple random walk on the 
graph of the grid 7, N equipped with its graph metric. Then for any two 
points x, y G Z d , the Ricci curvature along (xy) is 0. 

Indeed, we can transport the measure m x around x to the measure m y 
by a translation of vector y — x (and this is optimal), so that the distance 
between m x and m y is exactly that between x and y. 

This example generalizes to the case of Z n or equipped with any 
translation-invariant norm and any random walk given by a translation- 
invariant transition kernel (consistently with |LV| ) . For example, the tri- 
angular tiling of the plane has curvature. 

Remark 5 (Random walk at scale e) - It is easy to construct 
random walks on metric measure spaces. If (X, d, u) is a metric measure 
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space (for example with \i the Hausdorff measure) and e > 0, the random 
walk at scale e consists in, starting at a point x, randomly jumping in the 
ball of radius e around x, with probability density proportional to /x; namely 
dm x (y) := du.(y) / u.(B (x , e)) if d(x,y) ^ e (one can also use other functions 
of the distance, such as a Gaussian kernel). This allows to consider the Ricci 
curvature associated with this random walk. 

This is what we do now on Riemannian manifolds to get back the usual 
Ricci curvature (up to some normalization constants), hence the terminology. 

Proposition 6 - Let (X, d) be a smooth complete Riemannian manifold. 
Let v, w be unit tangent vectors at x E X. Let e, 5 > 0. Let y = exp x Sv and 
let w' be the tangent vector at y obtained by parallel transport of w along 
the geodesic exp x tv. Then 

as (e,6) — > 0. Here K(v,w) is the sectional curvature in the tangent plane 
(v,w). 




S 



Example 7 (Riemannian manifold) - Let (X, d) be a smooth com- 
plete N -dimensional Riemannian manifold. For some e > 0, let the Markov 
chain m e be defined by 

dm x {y) := 1 -dvol{y) 

vol[B{x, e)) 

if y G B(x,e), and otherwise. 

Let x e X and let v be a unit tangent vector at x. Let y be a point on 
the geodesic issuing from v, with d(x, y) small enough. Then 

e 2 



7 



Proof - This is essentially the same as Theorem 1.5 (condition (xii)) in |RS05| . 
except that therein, the infimum of Ricci curvature is used instead of its value 
along a tangent vector. The proof is postponed to Section El Basically, the 
value of k(x, y) is obtained by averaging the proposition above for w in the 
unit ball of the tangent space at x, which provides an upper bound for k. 
The lower bound requires use of the dual characterization of transportation 
distance (Theorem 1.14 in |Vil03| V □ 

Example 8 (Discrete cube) - Let X = {0, 1}^ be the discrete cube 
equipped with the Hamming metric (each edge is of length 1 ). Let m be the 
lazy random walk on the graph X , i.e. m x (x) = 1/2 and m x {y) = 1/2N if y 
is a neighbor of x. 

Let x, y G X be neighbors. Then n(x,y) = 1/N. 

This examples generalizes to arbitrary binomial distributions (see Sec- 
tion ESD. 

Here laziness is necessary to avoid parity problems: If no laziness is in- 
troduced, points at odd distance never meet under the random walk; in this 
case one must consider Ricci curvature for points at even distance only. 

Actually, since the discrete cube is a 1-geodesic space, one has k(x, y) ^ 
1/A for any pair x, y 6 X, not only neighbors (see Proposition fT9l) . 
Proof - We can suppose that x — 00 ... and y = 10 ... 0. For z e X and 
1 ^ i ^ N, let us denote by z % the neighbor of z in which the z-th bit is 
switched. An optimal coupling between m x and m y is as follows: For i ^ 2, 
move x l to y l (both have mass 1/2 N under m x and m y respectively). Now 
m x(%) = 1/2 and m y (x) = 1/2A, and likewise for y. To transport m x to m y , 
it is enough to move a mass 1/2 — 1/2N from x to y. All points are moved 
over a distance 1 by this coupling, except for a mass 1/2 A which remains 
at x and a mass 1/2 A which remains at y, and so the Ricci curvature is at 
least 1/A. 

Optimality of this coupling is obtained as follows: Consider the function 
/ : X — > {0, 1} which sends a point of X to its first bit. This is a 1-Lipschitz 
function, with f(x) = and f(y) = 1. The expectations of / under m x and 
m y are 1/2 A and 1 — 1/2 A respectively, so that 1 — 1/A is a lower bound 
on Ti(m x ,m y ). 

A very short but less visual proof can be obtained through the L 1 ten- 
sorization property (Proposition l26l) . □ 

Example 9 (Ornstein-Uhlenbeck process) Let s ^ 0,a > 

and consider the Ornstein-Uhlenbeck process in M. N given by the stochastic 



differential equation 

dX t = -oX t dt + s dB t 

where B t is a standard N -dimensional Brownian motion. The invariant dis- 
tribution is Gaussian, of variance s 2 /2a. 

Let St > and let the random walk m be the how at time St of the 
process. Explicitly, m x is a Gaussian probability measure centered at e~ aSt x, 
of variance s 2 (l — e~ aSt )/a ~ s 2 St for small St. 

Then the Ricci curvature y) of this random walk is 1 — e~ a5t , for any 
two x, y G M. N . 

Proof - The transportation distance between two Gaussian distributions 
with the same variance is the distance between their centers, so that k(x, y) = 

„ — aSt ^. P -Q(5t,, 

x _ |c xe_ y± 

\x-y\ 

Example 10 (Discrete Ornstein-Uhlenbeck) LetX = {—N,—N- 
1, . . . , N — 1, N} and let m be the random walk on X given by 

m k (k) = 1/2, m k (k + 1) = 1/4- k/AN, m k (k - 1) = 1/4 + k/AN 

which is a lazy random walk with linear drift towards 0. The binomial 
distribution ^w(^ k ) is reversible for this random walk. 

Then, for any two neighbors x, y in X , one has k(x, y) = 1/2N. 

Proof Exercise. □ 

Example 11 (Bakry-Emery) - LetX be an N -dimensional Rieman- 
nian manifold and F be a tangent vector held. Consider the differential 
operator 

L := -A + F.V 
associated with the stochastic differential equation 

dx t = Fdt + dB t 

where B t is the Brownian motion in X. The Ricci curvature (in the Bakry- 
Emery sense) of this operator is \ Ric - V sym F where V s - ym i™ = \(V l F^ + 
V J -F J ) is the symmetrized ofVF. 

Consider the Euler approximation scheme at time St for this stochastic 
equation, which consists in following the flow of F for a time St and then 
randomly jumping in a ball of radius y~(N + 2)St. 

Let x G X and let v be a unit tangent vector at x. Let y be a point on 
the geodesic issuing from v, with d(x, y) small enough. Then 

k{x, V ) = St (^Ric(v,v) - V sym F(v,v) + 0{d{x,y)) + O(VSt) 
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Proof - First let us explain the normalization: Jumping in a ball of radius 
e generates a variance £ 2 j^ m a gi ven direction. On the other hand, the 
iV-dimensional Brownian motion has, by definition, a variance dt per unit of 
time dt in any given direction, so a proper discretization at time St requires 
jumping in a ball of radius \/(N + 2)5t. Also, as noted in |BE85| . the gener- 
ator of Brownian motion is |A instead of A, hence the \ factor for the Ricci 
part. 

Now the discrete-time process begins by following the flow F for some 
time 5t. Starting at points x and y, using elementary Euclidean geometry, 
it is easy to see that after this, the distance between the endpoints behaves 
like d(x,y)(l + Stv.V v F + 0{5t 2 )). Note that v.V v F = \/ sym F(v,v). 

Now, just as in Example [3, randomly jumping in a ball of radius s results 
in a gain of d(x, y) 2 m+2) ^ c ( v j v ) 011 transportation distances. Here e 2 = 
(N + 2)St. So after the two steps, the distance between the endpoints is 

d(x, y) (l - * Ric(v, v) + 5t V sym F(v, v)j 

as needed, up to higher-order terms. □ 

Maybe the reason for the additional — V sym F in Ricci curvature a la 
Bakry-Emery is made clearer in this context: it is simply the quantity by 
which the flow of X modifies distances between two starting points. 

It is clear on this example why reversibility is not fundamental in this 
theory: the antisymmetric part of the force F generates an infinitesimal 
isometric displacement. Combining the Markov chain with an isometry of 
the space has no effect whatsoever on our definition. 

Example 12 (Multinomial distribution) - Consider the set X = 
{(xq,Xi, . . . ,Xd), Xi G N, ^Xi = N} viewed as the configuration set of N 
balls in d + 1 boxes. Consider the process which consists in taking a ball at 
random among the N balls, removing it from its box, and putting it back at 
random in one of the d + 1 boxes. More precisely, the transition probability 
from (x , . . . , Xd) to (x , . . . , Xj — 1, . . . , Xj + 1, . . . , Xd) (with maybe i = j) is 
Xi/N(d +1). The multinomial distribution pqrmrw^n JS revers ible for this 
Markov chain. 

Equip this configuration space with the metric d((xi), (x^)) := ~ S \ x i ~ x 'i 
which is the graph distance w.r.t. the moves above. Then the Ricci curvature 
of the Markov chain is 1/N. 

Proof - Exercise. □ 
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Example 13 (Geometric distribution) - Let the random walk on 
N be defined by the transition probabilities p n>n +i = 1/3, p n +i, n = 2/3 and 
po,o = 2/3. This random walk is reversible with respect to the geometric 
measure 2~ (jl+l \ It is easy to check that for n ^ 1 one has K n ,n+i = 0. 

PROOF - The transition kernel is translation-invariant except at 0. □ 

Section [5] contains more material about this latter example and how non- 
negative Ricci curvature sometimes implies exponential concentration. 

Example 14 (Geometric distribution, 2) - Let the random walk 
on N be defined by the transition probabilities p n;0 = a and p re , n +i = 1 — a 
for some < a < 1. The geometric distribution a{l — a) n is invariant (but 
not reversible) for this random walk. The Ricci curvature of this random 
walk is a. 

Example 15 (5-hyperbolic groups) - Let X be the Cayley graph of 
a non-elementary S-hyperbolic group with respect to some finite generating 
set. Let k be a large enough integer (depending on the group) and consider 
the random walk consisting in performing k steps of the simple random walk. 
Let x, y E X with d(x,y) > 2k. Then K,(x,y) = -2k/d(x, y) + 0(l/d(x, y)). 

Note that —2k/d(x,y) is the smallest possible value for n(x,y), knowing 
that the steps of the random walk are bounded by k. 

Proof - For z in the ball of radius k around x, and z' in the ball of radius 
k around y, elementary 5-hyperbolic geometry yields d(z, z') = d(x, y) + 
d(x,z) + d{y,z') - (y, ) y up to some multiple of 5, where (•, •) 

denotes the Gromov product with respect to some basepoint |GH90| . Since 
this decomposes as the sum of a term depending on z only and a term 
depending on z' only, to compute the transportation distance it is enough to 
study the expectation of (y, z) x for z in the ball around x, and likewise for 
(x, z') y . Knowing that balls have exponential growth, it is not difficult to see 
that the expectation of (y, z) x is bounded by a constant, whatever k, hence 
the conclusion. 

The same argument applies to trees or discrete 5-hyperbolic spaces with 
a uniform lower bound on the exponential growth rate of balls. □ 

Example 16 (Kac's random walk on orthogonal matrices, 
after |Oli| ) - Consider the following random walk on the set of N x N 
orthogonal matrices: at each step, a pair of indices l^i<j^Nis selected 
at random, an angle 9 G [0; 2n) is picked at random, and a rotation of angle 
6 is performed in the coordinate plane Equip the set of orthogonal 
matrices with the Riemannian metric on SO(iV) induced by the Hilbert- 
Schmidt inner product Tr(a*6) on its tangent space. It is proven in a preprint 
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by Oliveira \01i] that this random walk has curvature 1 — a/1 — 2/N(N — 1) 
1/N 2 . 



This is consistent with the fact that SO(iV) has, as a Riemannian mani- 
fold, a positive Ricci curvature in the usual sense. However, from the compu- 
tational point of view, Kac's random walk above is much nicer than either the 
Brownian motion or the e-scale random walk of Example [7l Oliveira uses this 
result to prove a new estimate 0(N 2 \nN) for the mixing time of this random 
walk, nicely improving on previous estimates 0(iV 4 ln N) by Diaconis-Saloff- 
Coste and 0(N 2 - 5 In N) by Pak-Sidenko (an easy lower bound is Q(N 2 )), 
see [Hlij . 

Example 17 (Glauber dynamics for the Ising model) Let 

G be a finite graph. Consider the configuration space is X := { — 1,1 } G 
together with the energy function U (S) := — $Zx~«eG ^( x )^(v) ~ H J2 X ^( x ) 
for S G X, where H eR is the external magnetic held. For some 0^0, equip 
X with the Gibbs distribution jx := e~^ u /Z where as usual Z := ^2 s e~^ u ^ s ' . 
The distance between two states is dehned as the number of vertices of G at 
which their value differ. 

For S e X and x G G, denote by S x+ and S x _ the states obtained from 
S by setting S x+ (x) = +1 and S x _(x) = — 1, respectively. Consider the fol- 
lowing random walk on X (known as the Glauber dynamics,): at each step, 
a vertex x G G is chosen at random, and a new value for S(x) is picked 
according to local equilibrium, i.e. S(x) is set to 1 or —1 with probabili- 
ties proportional to e -/3C/ ( 5a: +) and q-P u ^ s ^~) respectively (note that only the 
neighbors of x influence the ratio of these probabilities). The Gibbs distri- 
bution is reversible for this Markov chain. 

Then the Ricci curvature of this Markov chain is at least 



where v max is the maximal valency of a vertex of G. In particular, if 



then curvature is positive. Consequently, the critical (3 is at least this quan- 
tity. 

This estimate for the critical temperature coincides exactly with the one 
derived in |Gri67| : actually our argument generalizes to non-constant values 
of the coupling J xy between spins, and the positive curvature condition ex- 
actly amounts to G(0) < 1 in that paper's notation ( |Gri67| . Eq. (19)), or, 
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equivalently, to Dobrushin's criterion using a single site. For comparison, the 
exact value of the critical (3 for the Ising model on the regular infinite tree 
of valency v is |ln {tt^), which shows asymptotic optimality. 

As shown in the rest of this paper, positive curvature implies several prop- 
erties, especially, exponential convergence to the equilibrium, concentration 
inequalities and a modified logarithmic Sobolev inequality. I do not know 
how these results compare to the literature. 

Since the argument presented below does not rely on exact solutions but 
on quantitative estimates, it is obviously not specific to the Ising model: the 
only property we used is that the influence of a vertex on the local equilibrium 
of its neighbors is bounded. 

Proof - Using Proposition HU it is enough to bound Ricci curvature for 
pairs states at distance 1. Let S, S' be two states differing only at x G G. 
We can suppose that S(x) = —1 and S'(x) = 1. Let ms and ms> be the law 
of the step of the random walk issuing from S and 5" respectively. We have 
to prove that the transportation distance between ms and my is at most 

1 - J- (\ - 7, C' 3 -C~ l3 \ 
1 \G\ y 1 Umax eP+e-P J ■ 

The measure mj decomposes as ms = pr X^eG m S' according to the 
vertex y G G which is modified by the random walk, and likewise for my. 
To evaluate the transportation distance, we will compare m v s to m v s ,. 

If the step of the random walk consists in modifying the value of S at 
x (which occurs with probability 1/\G\), then the resulting state has the 
same law for S and S', i.e. m x s = m x s ,. Thus in this case the transportation 
distance is and the contribution to Ricci curvature is 1 x pr, 

If the step consists in modifying the value of S at some point y in G not 
adjacent to x, then the value at x does not influence local equilibrium at y, 
and so m v s and m v s , are identical except at x. So in this case the distance is 
1 and the contribution to Ricci curvature is 0. 

Now if the step consists in modifying the value of S at some point y G G 
adjacent to x (which occurs with probability v x / \ G\ where v x is the valency 
of x), then the value at x does influence the law of the new value at y, by 
some amount which we now evaluate. The final distance between the two 
laws will be this amount plus 1 (1 accounts for the difference at x), and the 
contribution to Ricci curvature will be negative. 

Let us now evaluate this amount more precisely. Let y G G be adjacent to 
x. Set a = e~P u (- s y+) /e-PU^y-) . The step of the random walk consists in set- 
ting S(y) to 1 with probability and to —1 with probability Setting 
likewise a' = q~P u ( s v+} /q-P u ( s v -) for S', we are left to evaluate the distance 
between the distributions on { — 1, 1} given by (^-; ^tt) and (^rj; ^txj)- R 
is immediate to check, using the definition of the energy U, that a' = e 4/3 a. 
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Then, a simple computation shows that the distance between these two dis- 
tributions is at most fe^. This value is actually achieved when y has odd 
valency, H = and switching the value at x changes the majority around 
y. (Our argument is suboptimal here when valency is even — a more precise 
estimation yields the absence of a phase transition in dimension 1.) 

Combining these different cases yields the desired curvature evaluation. 
To convert this into an evaluation of the critical (3, reason as follows: Mag- 
netization, defined as pr XLeG S( x )i * s a ppLipschitz function of the state. 
Now let fio be the Gibbs measure without magnetic field, and \ih the Gibbs 
measure with external magnetic field h. Use the Glauber dynamics with mag- 
netic field h, but starting with an initial state picked under /x ; Cor. [22] yields 
that the magnetization under hh is controlled by ^ 7i(/x , Ho *m)/K where 
k is the Ricci curvature, and 7^(// ,/io * m ) is the transportation distance 
between the Gibbs measure Ho an d the measure obtained from it after one 
step of the Glauber dynamics with magnetic field h; reasoning as above this 
transportation distance is easily bounded by r^r , so that the deriva- 

tive of the magnetization w.r.t. h stays bounded when \G\ — > oo. (Compare 
Eq. (22) in |Gri67j .) □ 

More examples can be found in Sections 13.3.31 (binomial and Poisson 
distributions) . IXOl (Ml Ml oo queues and generalizations) and [5] (geometric 
distributions on N, exponential distributions on M. N ). 

1.3 Overview of the results 

Notation for random walks. Before presenting the main results, we need 
some more quantites related to the local behavior of the random walk: the 
jump, which will help control the diameter of the space, and the spread, which 
is the analogue of a diffusion constant and will help control concentration 
properties. Moreover, we define a notion of local dimension. The larger the 
dimension, the better for concentration of measure. 

Definition 18 (Jump, spread, dimension) - Let tie jump of the 

random walk at x be 



J(x) := E nix d(x, ■) = %_(5 x ,m x ) 



Let the spread of the random walk at x be 




1/2 



(x) := f - // d(y,z) 2 dm x (y) dm x (z) 
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and, if v is a invariant distribution, let 

& ■■= \W{x)\\l*(x,„) 

be the average spread. 

Let also <J OQ (x) := \ diam Supp m x and := supcr 00 (x). 
Let the local dimension at x be 



sup{Var ma; /, / 1-Lipschitz} 

and hnally n := inf x n x . 



About this definition of dimension. Obviously n x ^ 1. For the discrete- 
time Brownian motion on a ^-dimensional Riemannian manifold, one has 
n x « N (see the end of Section [8]). For the simple random walk on a graph, 
n x ~ 1. This definition of dimension amounts to saying that in a space of 
dimension n, the typical variations of a (1-dimensional) Lipschitz function 
are \jy/n times the typical distance between two points. This is the case 
in the sphere S n , in the Gaussian measure on IR n , and in the discrete cube 
{0, 1}". So generally one could define the "statistical dimension" of a metric 
measure space (X, d, u) by this formula i.e. 

St a t D im (X, UH^p^L 
sup{ Var M /, / 1-Lipscnitz} 

so that for each x G X the local dimension of X at x is n x = StatDim(X, d, m x ). 
With this definition, R N equipped with a Gaussian measure has statistical 
dimension iV and local dimension w N, whereas the discrete cube {0, 1}^ 
has statistical dimension ~ N and local dimension ~ 1. 

We now turn to the description of the main results of the paper. 



Elementary properties. In Section [2] are gathered some straightforward 
results. 

First, we prove (Proposition [T9l) that in an e-geodesic space, it is enough 
to get a lower bound on k(x, y) for points x, y with d(x, y) ^ s, to get a lower 
bound on k for all pairs of points. This is simple yet very useful: indeed in 
the various graphs given above as examples, it was enough to compute the 
Ricci curvature for neighbors. 

Second, we prove equivalent characterizations of having Ricci curvature 
uniformly bounded from below: A space satisfies k(x, y) ^ k if and only if 
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the random walk operator is (1 — /^-contracting on the space of probability 
measures equipped with the transportation distance (Proposition [20]) , and if 
and only if the random walk operator acting on Lipschitz functions contracts 
the Lipschitz norm by (1 — k) (Proposition 1281). An immediate corollary of 
the contracting property for probability measures is the existence of a unique 
invariant distribution when k > 0. 

The property of contraction of the Lipschitz norm implies, in the re- 
versible case, that the spectral gap of the Laplacian operator associated with 
the random walk is at least k; this can be seen as a generalization of Lich- 
nerowicz' theorem, and provides sharp estimates of the spectral gap in several 
examples. 

In analogy with the Bonnet-Myers theorem, we prove that if Ricci cur- 
vature is bounded below by k > 0, then the diameter of the space is at most 
2sup x J{x)/k (Proposition [23]) . In case J is unbounded, we can evaluate in- 
stead the average distance to a given point xo under the invariant distribution 
v (Proposition[24]); namely, f d(x , y) du(y) ^ J(x )/k. In particular we have 
f d(x,y) dz/(x)d^(?/) ^ 2inf J/k. These are L 1 versions of the Bonnet-Myers 
theorem rather than generalizations: from the case of manifolds one would 
expect instead of 1/k. Actually this L 1 version is sharp in all our ex- 

amples except Riemannian manifolds; in Section [7] we investigate additional 
conditions for an L 2 version of the Bonnet-Myers theorem to hold. 

Let us also mention two elementary constructions preserving positive cur- 
vature, namely, superposition and L 1 tensorization (Propositions l25l and [26]) . 

Concentration results. Basically, if Ricci curvature is bounded below by 
k > 0, then the invariant distribution satisfies concentration results with 
variance a 2 /nn (up to some constant factor). This estimate is often sharp, 
as discussed in Section [331 where we revisit some of the examples. 

However, the type of concentration (Gaussian, exponential, or l/t 2 ) de- 
pends on further local assumptions: indeed, just as in the central limit theo- 
rem, positive Ricci curvature can only carry at the global scale what is already 
true at the local scale. Without further assumptions, one only gets that the 
maximal variance of a 1-Lipschitz function is at most a 2 /nK, hence concen- 
tration like a 2 /nKt 2 (Proposition l3T]) . If we make the further assumption 
that the support of the measures m x is uniformly bounded (i.e. < oo), 
then we get mixed Gaussian-then-exponential concentration, with variance 
a 2 /tin (Theorem l32l. The width of the Gaussian window depends on o"oo, 
and on the rate of variation of the spread a(x) 2 . 

For the case of Riemannian manifolds, simply taking smaller and smaller 
steps for the random walks makes the width of the Gaussian window tend to 
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infinity, so that we recover Gaussian concentration as in the Levy-Gromov or 
Bakry-Emery context. However, for lots of discrete examples, the Gaussian- 
then-exponential behavior is genuine. Examples where tails are Poisson- 
like (binomial distribution, M/M/oo queues) or exponential are given in 
Sections 13.3.31 to 13.3.51 

We also get concentration results for the finite-time distributions m* fc 
(Remark [33]). 

Log-Sobolev inequality. Using a suitable non-local notion of norm of 
the gradient, we are able to mimic the proof by Bakry and Emery of a 
logarithmic Sobolev inequality for the invariant distribution. The gradient we 
use (Definition [37j) is (Df)(x) := sup^ z dfe;) ex P( — A<i(x, y) — Xd(x,z)). 
This is a kind of "semi-local" Lipschitz constant for /. Typically the value of 
A can be taken large at the "macroscopic" level; for Riemannian manifolds, 
taking smaller and smaller steps for the random walk allows to take A — > oo 
so that we recover the usual gradient for smooth functions. 

The inequality takes the form Ent / ^ C J (Df) 2 / f du (Theoreml40l). The 
main tool of the proof is the contraction relation D(Mf) ^ (1 — n/2)M(Df) 
where M is the random walk operator (Proposition 1431) . 

That the gradient is non-local, with a maximal possible value of A, is 
consistent with the possible occurrence of non-Gaussian tails. 

Exponential concentration and non-negative curvature. The sim- 
plest example of a Markov chain with zero Ricci curvature is the simple 
random walk on N or Z, for which there is no invariant distribution. How- 
ever, we show that if furthermore there is a "locally attracting" point, then 
non-negative Ricci curvature implies exponential concentration. The main 
examples are the geometric distribution on N, and the exponential distribu- 
tion e - !^ on W 1 associated with the stochastic differential equation dX t = 
dB t — -p^y dt. In both cases we recover correct orders of magnitude. 

Gromov-Hausdorff topology. One advantage of our definition is that it 
involves only combinations of the distance function, and no derivatives, so 
that it is more or less impervious to deformations of the space. In Section [6] 
we show that Ricci curvature is continuous for Gromov-Hausdorff conver- 
gence of metric spaces (suitably reinforced, of course, so that the random 
walk converges as well), so that having non-negative curvature is a closed 
property. We also suggest a loosened definition of Ricci curvature, requiring 
that Ti(m x , m y ) ^ (1 — K)d(x, y) + 5 instead of Ti(m x , m y ) ^ (1 — K)d(x, y). 
With this definition, positive curvature becomes an open property, so that a 
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space close to one with positive curvature has positive curvature. Properties 
of this loose version will be investigated in another paper. 



2 Elementary properties 

2.1 Geodesic spaces 

The idea behind curvature is to use local properties to derive global ones. We 
give here a simple proposition expressing that in near-geodesic spaces, such 
as graphs or manifolds, it is enough to check positivity of Ricci curvature for 
nearby points. 

Proposition 19 - Suppose that (X,d) is e-geodesic in the sense that 
for any two points x, y G X , there exists an integer n and a sequence Xq = 
x, x±, . . . , x n — y such that d(xi, Xj+i) ^ e and d(x, y) = d(xi, x i+i)- 

Then, if k(x, y) ^ k for any pair of points with d(x, y) ^ e, then k(x, y) ^ 
k for any pair of points x, y G X. 

Proof - Since T\ is a distance, one has Ti{m x ,m y ) ^ T\ {m Xi , m Xi+1 ) ^ 

(i - K)j2d(zi,Xi+i)- n 

2.2 Contraction on the space of probability measures 

Let V(X) by the space of all probability measures /i on X with finite first 
moment, i.e. for some (hence any) o G X, J d(o, x) dji{x) < oo. On V(X), 
the transportation distance Ti is finite, so that it is actually a distance. 
Let /i be a probability measure on X and define the measure 



which is the image of \i by the random walk. (It may or may not belong to 



The following proposition also appears in |DGW04) (in the proof of Propo- 
sition 2.10) and in [DTI] . 

Proposition 20 - Let (X, d, m) be a metric space with a random walk. 
Let kgK. Then the we have k(x, y) ^ n for all x, y G X, if and only if for 
any two probability distributions /i, // G V(X) one has 




X 



V(X).) 



T\ (/i * m, jj! * m) ^ (1 — i%)Ti(fi, //) 



Moreover in this case, if /i G V(X) then \i * m G V(X). 
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Proof - First, suppose that convolution with m is contracting in T\ dis- 
tance. For some x, y G A, let a, — 6 X and /J = 5 y be the Dirac measures 
at x and y. Then by definition 5 X * m = m x and likewise for y, so that 
Ti(m x ,m y ) ^ (1 — k)7[(S x , 5 y ) = (1 — n)d(x,y) as required. 

The converse is more difficult to write than to understand. For each pair 
(x, y) let £ xy be a coupling (i.e. a measure on A x X) between m x and m y 
witnessing for k(x, y) ^ k. According to Corollary 5.22 in |Vil| . we can choose 
^ xy to depend measurably on the pair (x, y). Let 5 be a coupling between \i 
and fjf witnessing for //). Then J XxX dE(x, y) ^ xy is a coupling between 
\i * m and fjf * m and so 

7i (ji * m, fjf * m) ^ 



by the Fubini theorem applied to d(x, y) dE(x / , y') d£ X ' iy '(x, y). 

To see that in this situation V(X) is preserved by the random walk, fix 
some origin o G X and note that for any fi G V(X), the first moment of fi*m 
is Ti(5 ,/i*m) ^ Ti(5 ,m ) + 7!(m ,/i*m) ^ T x (8 ,m ) + (1 - K)T x {o,fi). 
Now Ti(o,[i) < oo by assumption, and Ti(5 ,m ) < oo by our definition of 
random walks (Definition [T|) . □ 

As an immediate consequence of this contracting property we get: 

Corollary 21 Suppose that k(x, y) ^ k > for any two distinct x, y G 
A. Then the random walk has a unique invariant distribution v G V(X). 

Moreover, for any probability measure fi G V(X), the sequence \i * m* n 
tends exponentially fast to v in T x distance. Namely 

and in particular 

%.{m™,u) ^ (l-K) n J(x)/K 

The last assertion follows by taking fj = 5 X and noting that J(x) = 
T 1 (S x ,m x ) so that Ti(S x , v) ^ T 1 (5 x ,m x ) + T 1 (m x ,ij) < J(x) + (1-k)T 1 (S x , v), 
hence Ti(5 x ,v) ^ J(x)/k. 



■'■■!) 



d(x,y)d< dE(x , y ) £ x i y > >(x,y) 



dE(x ,y)d£ a j y r(x,y)d(x,y) 

x',y' 

dE(x',y')d(x',y')(l-K(x',y')) 

i' 

1 - kSTAu, u!) 
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Another interesting corollary is the following, which allows to estimate 
the average of a Lipschitz function under the invariant measure, knowing 
some of its values. This is useful in concentration theorems, to get bounds 
not only on the deviations from the average, but on what the average actually 
is. 

Corollary 22 - Suppose that k(x, y) ^ k > for any two distinct 
x, y G X. Let v be the invariant distribution. 

Let f be a 1-Lipschitz function. Then, for any distribution /x, one has 
\EJ-EJ\ < TiGu,/!*™)/*;. 

In particular, for any ifl one has \f(x) — E„/| ^ J(x)/k. 

Proof - One has * m,u) ^ (1 — k)7i(/x, v). Since by the triangle 
inequality, (/x * m, z/) ^ Ti(fi,u) — 7i (/x, /x * m) , one gets T\{p>, v) ^ 1\ (/i, \i * 
m)/ k. Now if / is a 1-Lipschitz function, for any two distributions one 
has \~Efj,f — ^ Ti(fi, fj!) hence the result. 

The last assertion is simply the case when /x is the Dirac measure at x. 

□ 



2.3 L 1 Bonnet-Myers theorems 

We now give a weak analogue of the Bonnet-Myers theorem. This result 
shows in particular that positivity of Ricci curvature is a much stronger 
property than some spectral gap bound: there is no Ricci curvature analogue 
of a family of expanders. 

Proposition 23 (L 1 Bonnet-Myers) - Suppose that k(x, y) ^ k > 
for all x, y e X . Then for any x, y 6 X one has 

J(x) + J(y) 

d{x,y) ^ 7 v — 

K(x,y) 

and in particular 

diamX< 2sUP - J(:c) 



Proof - Let d = d(x,y). By assumption we have Ti(m x ,m y ) ^ d(l — k). 
By definition we have 7i(m x ,5 x ) = J(x) and Ti(m y ,S y ) = J(y). So d ^ 
J(x) + J(y) + d(l-K). □ 

This result is not sharp at all for Brownian motion in Riemannian man- 
ifolds (since J ~ e and k ~ £ 2 Ric /N, it fails by a factor \je compared to 
the Bonnet-Myers theorem!), but is sharp in many other examples. 
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For the discrete cube X = {0, 1}^ (Example [8] above), one has J = 1/2 
and k = 1/-/V, so we get diamX ^ N which is the exact value. 

For the discrete Ornstein-Uhlenbeck process (Example [10] above) one has 
J = 1/2 and k = 1/2N, so we get diamX «C 2N which once more is the 
exact value. 

For the continuous Ornstein-Uhlenbeck process on R (Example M with 
N = 1), the diameter is infinite, consistently with the fact that J is un- 
bounded. If we restrict the process to some large interval [-R] R] with 
R 3> s/y/a (e.g. by reflecting the Brownian part), then sup J ~ aR5t on 
this interval, and k — (1 — e aSt ) ~ aSt so that the diameter is bounded by 
2R, which is correct. 

These examples show that one cannot replace J / k with J/ \f~K in this 
result (as could be expected from the example of Riemannian manifolds). 
In fact, Riemannian manifolds seem to be the only simple example where 
there is a diameter bound behaving like 1 / y/n. In Section [7] we investigate 
conditions under which an L 2 version of the Bonnet-Myers theorem holds. 

In case J is not bounded, we can estimate instead the "average" diameter 
J d(x, y) &v(x)dv{%i) under the invariant distribution v. This estimate will 
prove very useful in several examples, to get bounds on the average of a(x) 
in cases where a(x) is unbounded but controlled by the distance to some 
"origin" (see e.g. Sections 13.3.41 and 13.3.51) . 

Proposition 24 (Average L 1 Bonnet-Myers) Suppose that 
k(x, y) ^ K > for any two distinct x, y e X . Then for any x G X, 

d(x,y)du(y) < " 



x 



K 



and so 

2 infj, J(x) 



/ d(x,y)dv(x)dv(y) 

J XxX 



K 



Proof - The first assertion follows from Corollary [22] with / = d(x, •). 

For the second assertion, choose an x such that J(x ) is arbitrarily close 
to inf J, and write 

/ d(y,z)dv(y)dv(z) < / (d(y, x Q ) + d{x , z)) dv(y) du(z) 

JXxX JXxX 

= 2T 1 {6 X0 ,u) <2J(x )/k 

which ends the proof. □ 
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2.4 Two constructions 



Here we describe two very simple constructions which trivially preserve pos- 
itive curvature, namely, superposition and L 1 tensorization. 

Superposition states that if we are given two random walks on the same 
space and construct a new one by, at each step, tossing a coin and deciding 
to follow either one random walk or the other, then the Ricci curvatures mix 
nicely. 

Proposition 25 (Superposition) - Let X be a metric space equipped 
with a family (mW) of random walks. Suppose that for each i, the Ricci 
curvature of m^ 1 ' is at least Let (a*) be a family of non-negative real 
numbers such that Yli a i = 1- Define a random walk m on X by m x : = 
^2 o>im!i\ Then the Ricci curvature of m is at least ^2 a i K i- 

Proof - Let x, y 6 X and for each i let & be a couplings between mi and 
m{i) . Then * s a coupling between Yl a i m ^ and ^a^m^ , so that 

T x (m x ,m y ) ^^onTx (m«,m«) 
< y^ai(l - Ki)d(x,y) 
= (l - d(x,y) 

(i) (i) 

Note that the coupling above, which consists in sending each mi to m y , 
has no reason to be optimal, so that in general equality does not hold. □ 

Tensorization states that if we perform a random walk in a product space 
by deciding at random, at each step, to move in one or the other component, 
then positive curvature is preserved. 

Proposition 26 (L 1 tensorization) - Let (X 1 , . . . ,X k ) be a finite 
family of metric spaces equipped with a family of random walks (m^\ . . . , m^) 
Let X be the product of the spaces X i} equipped with the distance Let 
(ccj) be a family of non-negative real numbers such that = 1. Consider 
the random walk on X defined by 

Suppose that for each i, the Ricci curvature of m' 1 ' is at least K{. Then 
the Ricci curvature of m is at least inf a^. 

For example, this allows for a very short proof that the curvature of the 
lazy random walk on the discrete cube {0, 1}^ is 1/N (Example [8]). Indeed, 
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it is the iV-fold product of the random walk on {0, 1} which sends each point 
to the equilibrium distribution (1/2, 1/2), hence is of curvature 1. 

The case when some ati is equal to shows why the Ricci curvature is 
given by an infimum: indeed, if a,i = then the corresponding component 
never gets mixed, hence curvature cannot be positive (unless this component 
is reduced to a single point). 

Here the statement is restricted to a finite product for the following tech- 
nical reasons: First, to define the L 1 product of an infinite family, a basepoint 
has to be chosen. Second, in order for the formula above to define a random 
walk with finite first moment (see Definition []]) , some uniform assumption 
on the first moments of the mS l > is needed. 

Proof - For x e X let rhx stand for 8 X1 ® • • • ® m x . ® • • • ® 5 Xk . 
Let x = (xi) and y = (yi) be two points in X. Then 

V j& J 

< ^2 a i ( (i - Ki)<k{xi, yi) + ^2 d j( x i> yj) ) 

V j& J 

= ^2®i (-Kidi(xi, yi) + ^2 d A x v vS) 

= ^2 di{xh Hi) - ^ a i K idi{xi, yi) 

< (1 - inf aiKi) d(x,y) 

□ 



2.5 Lipschitz functions and spectral gap 

Definition 27 (Averaging operator, Laplacian) For f e 

L 2 (X, v) let the averaging operator M be 

Mf(x) := / f(y)dm x (y) 
Jy 

and let A := M - Id. 

(This is the layman's convention for the sign of the Laplacian, i.e. A = 
on M, so that on a Riemannian manifold A is a negative operator.) 
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The following proposition also appears in |DGW04) (in the proof of Propo- 
sition 2.10). 

Proposition 28 - Let (X,d,m) be a random walk on a metric space. 
Let kgR. 

Then the Ricci curvature of X is at least k, if and only if, for every 
k-Lipschitz function f : X — > R, the function Mf is k{l — K)-Lipschitz. 

Proof - First, suppose that the Ricci curvature of X is at least k. Then 
we have 



Mf(y) - Mf(x) = f(y + z)- f{x + z) 

J z 

^ k J d(x + z,y + z) 

= kd(x,y)(l - n(x,y)) 

Conversely, suppose that whenever / is 1-Lipschitz, Mf is (1— ft)-Lipschitz. 
The duality theorem for transportation distance (Theorem 1.14 in fVTl03j) 
states that 



Ti(m x ,m y ) = sup // d(m x - m y 

f 1-Lipschitz J 



f 1-Lipschit 

sup Mf(x) - Mf(y) 

f 1-Lipschitz 

< (1 - n)d{x,y) 

□ 

Let v be an invariant distribution of the random walk. Consider the space 

2 1 1 2 

L 2 (X,i/)/{const} equipped with the norm ||/|| L 2 (Xil/)/{const} := \ f ~ I f^\\ L 2 {x>v) 
so that 

L3(X,i/)/{canHt} = Val - / = 9 / ~ f ^ dz/ ( X ) 



XxX 



The operators M and A are self-adjoint in L 2 (X, v) if and only if v is re- 
versible for the random walk. 

It is easy to check, using associativity of variances, that 



Var„ f = J Xar mx f du(x) + Var^ Mf 
so that ||M/|| 2 < ||/|| 2 . It is also clear that WMf]^ ^ Wf]^. 
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Usually, spectral gap properties for A are expressed in the space L 2 . The 
proposition above only implies that the spectral radius of the operator M 
acting on Lip(X) /{const} is at most (1 — k). In general it is not true that 
a bound for the spectral radius of an operator on a dense subspace of a 
Hilbert space implies a bound for the spectral radius on the whole space. 
This holds, however, when the operator is self-adjoint or when the Hilbert 
space is finite-dimensional. 

Proposition 29 - Let (X,d,m) be metric space with random walk, with 
invariant distribution v. Suppose that the Ricci curvature of X is at least 
k > and that o < oo. Suppose that v is reversible, or that X is finite. 

Then the spectral radius of the averaging operator acting on L 2 (X, v) /{const} 
is at most 1 — k. 

Proof - First, if X is finite then Lipschitz functions coincide with L 2 func- 
tions, so that there is nothing to prove. So we suppose that v is reversible, 
i.e. M is self-adjoint. 

Let / be a fc-Lipschitz function. Proposition [3D below implies that Lip- 
schitz functions belong to L 2 and that the Lipschitz norm controls the L 2 
norm. (This is where we use that a < oo.) 

Since M*f is k(l — K)*-Lipschitz one gets VarM'/ ^ Ck 2 (l — n) 2t for 
some constant C so that lim^oo (-y/Var M'/) 1 /* ^ (1 — «). Now Lipschitz 
functions are dense in L 2 {X, v). Since M is bounded and self-adjoint, its 
spectral radius is at most 1 — k. □ 

Corollary 30 - Let (X, d, m) be an ergodic random walk on a metric 
space, with invariant distribution v. Suppose that the Ricci curvature of X 
is at least n > and that a < oo. Suppose that v is reversible. 

Then the smallest eigenvalue of —A on L 2 (X, v)/ {const} is at least k. 

Moreover the following discrete Poincare inequalities are satished for f G 
L\X,v): 

Vai V / ^ —7z r / Var ma; / du(x) 

«(2 - K) J 

and 

Var„ / < -L J! (f(y) - f(x)) 2 du(x) dm x (y) 

Proof - These are rewritings of the inequalities Var^ Mf ^ (1 — k) 2 Var u f 
and (f,Mf) L 2 {x>u y {const} < (1 - K)Vax u f, respectively. □ 

The quantities Var mx f and \ J(f(y) — f(x)) 2 dm x (y) are two possible 
definitions of ||V/(a;)|| 2 in a discrete setting. Though the latter is more 
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common, the former is preferable when the support of m x can be far away 
from x and cancels out the "drift". Moreover one always has Var ma . / ^ 
/ ifiv) ~ f( x )) 2 dm x (y), so that the first form is generally sharper (note that 
since k ^ 1 one has 1/k(2 — k) ^ 1/ac). 

Reversibility is really needed here to turn an estimate of the spectral 
radius of M into an inequality between the norms of Mf and /, using that 
M is self-adjoint. When the random walk is not reversible, a version of the 
Poincare inequality with a non-local gradient still holds (Theorem [40j) . 

Let us compare this result to Lichnerowicz' theorem in the case of the 
random walk at scale e on an iV-dimensional Riemannian manifold with 
positive Ricci curvature. The operator A associated with the random walk 
is the difference between the mean value of a function on a ball of radius e, 

2 

and its value at the center of the ball: when e — > this behaves like 2 {n+2) 
times the usual Laplacian, by taking the average on the ball of the Taylor 
expansion of /. Meanwhile, we saw (Example Cj) that k ~ 2 (iv+2) i n f Ric, 
where inf Ric is the largest K such that Ric(i>, v) ^ K for all unit tangent 
vectors v. Note that both scaling factors are the same. On the other hand 
the Lichnerowicz theorem states that the smallest eigenvalue of the usual 
Laplacian is jf^j inf Ric. So we miss the factor, but otherwise get the 
correct order of magnitude. 

Second, let us test this corollary for the discrete cube of Example El In 
this case the eigenbase of the discrete Laplacian is well-known (characters, or 
Fourier/ Walsh transform), and the spectral radius of the lazy random walk 
is exactly 1 — 1/N. Since the Ricci curvature k is 1/N, the value given in 
the proposition is sharp. 

Third, consider the Ornstein-Uhlenbeck process on R, as in Example [H 
Its infinitesimal generator is L = — ax-^, and the eigenfunctions are 

known to be Hk(x^a/ s 2 ) where is the Hermite polynomial Hk{x) := 
( — l) k e x2 -^e~ x2 . The associated eigenvalue of L is -na, so that the spectral 
gap of L is a. Now the random walk we consider is the flow e 5tL at time St 
of the process (with small St), whose eigenvalues are e~ naSt . So the spectral 
gap of the discrete Laplacian e 5tL — Id is 1 — e~ aSt . Since the Ricci curvature 
is 1 — e~ aSt too, the corollary is sharp again. 
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3 Concentration results 



3.1 Variance of Lipschitz functions 

We begin with the simplest kind of concentration, namely, an estimation of 
the variance of Lipschitz functions. Contrary to Gaussian or exponential 
concentration, the only assumption needed here is that the average spread a 
is finite. 

Since our Gaussian concentration result will yield basically the same vari- 
ance a 2 /riK, we discuss sharpness of this estimate in various examples in 
Section 13.31 

Proposition 31 Let (X, d, m) be a random walk on a metric space, with 
Ricci curvature at least k > 0. Let v be the unique invariant distribution. 
Suppose that o < oo. 

2 2 

Then the variance of a 1-Lipschitz function is at most — £ — r ^ — . 

1 nK(2—K,) riK 

In particular, this implies that all Lipschitz functions are in L 2 /{const}; 
especially, J d{x ) y) 2 di'{x)du{y) is finite. The fact that the Lipschitz norm 
controls the L 2 norm was used above in the discussion of spectral properties 
of the random walk operator. 

Proof - Suppose for now that / is bounded by A e R, so that Var/ < oo. 
We first prove that VarM*/ tends to 0. Let B r be the ball of radius r in X 
centered at some basepoint. Using that M l f is (1 — ft)*-Lipschitz on B r and 
bounded by A on I \B r , we get Var M l f = \ ff(f{x) - f{y)) 2 du(x)du(y) ^ 
2(1 — n) 2t r 2 + 2A 2 u(X \ B r ). Taking for example r = 1/(1 — k) 1 / 2 ensures 
that VarM*/ -> 0. 

As already mentioned, one has Var / = Var Mf + JVa.r mx f du(x). Since 
Var M l f — > 0, by induction we get 



Now by definition Var m:E / ^ a(x) 2 /n x . Since M*/ is (1 — K)'-Lipschitz, we 
have Va.r mx M f f ^ (1 — i%) 2t cr(x) 2 /n x so that the sum above is at most nK ^ 2 - K ) ■ 
The case of unbounded / is treated by a simple limiting argument. □ 

3.2 Gaussian concentration 

As mentioned above, positive Ricci curvature implies a Gaussian-then-exponential 
concentration theorem. The estimated variance is <j 2 /nn as above, so that 
this is essentially a more precise version of Proposition I3T1 with some loss in 




t=o 
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the constants. We will see in the discussion below (Section 13.31) that in the 
main examples, the order of magnitude is correct. 

The fact that concentration is not Gaussian far away from the mean is gen- 
uine, as exemplified by the binomial distribution on the cube (Section 13.3.31) 
or M/M/oo queues (Section 13.3.41) . A purely exponential behavior can be 
achieved in very simple examples if (x) is not bounded (Example [T4"l) or 
if the spread a(x) 2 grows fast enough (Section [3.3.5H . In these examples, the 
transition from Gaussian to non-Gaussian regime occurs roughly as predicted 
by the theorem. 

In the case of Riemannian manifolds, simply letting the step of the random 
walk tend to makes the width of the Gaussian window tend to infinity, so 
that we recover Gaussian concentration as in the Levy-Gromov or Bakry- 
Emery theorems. 

The width of the Gaussian window is controlled by two factors: the quan- 
tity (Too, which represents the "granularity" of the process and can result in 
Poisson-like behavior; and the rate of variation of the spread u(x) 2 , which 
can result in exponential behavior. The latter phenomenon yields to the 
assumption that a(x) 2 is bounded by a Lipschitz function. 

Theorem 32 - Let (X, d, m) be an ergodic random walk on a metric space 
as above, with invariant distribution v. Suppose that for any two distinct 
points x, y G X one has k(x, y) ^ k > 0. 



Let 





Tlx Kj 



and 



Suppose that the function x 




2D 2 



t 



max 



max(2C, Sa^) 



Then for any 1-Lipschitz function f , for any t ^ t max we have 



u{{x,f{x) >t + E u f}) ^exp 



6D 2 



and for t > t 



max 
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Remark 33 - It is clear from the proof below that a{x) 2 /n x K itself need 
not be Lipschitz, only bounded by some Lipschitz function. In particular, if 
a(x) 2 is bounded one can always take D 2 = sup x and C — 0. 

It might seem that, in order to estimate E^D^, one needs to know in 
advance concentration properties for the invariant distribution v\ however, 
Proposition [24] or Corollary [22] often provides sharp estimates for E^.D 2 , as 
we shall see in the examples. 

In Secion 13.3.51 we give a simple example where the Lipschitz constant 
of o[x) 2 is large, resulting in exponential rather than Gaussian behavior. In 
Section [332] we give an example of a process with quadratic growth of cr(x) 2 , 
and which exhibits non-exponential tails. Thus the Lipschitz assumption 
cannot simply be removed. 

The assumption that is bounded can be replaced with a Gaussian- 
type control for the local measures m x , which however generally results in 
much poorer estimates of the variance in discrete situations (see Remark l3ol). 
PROOF - This proof is a variation on standard martingale methods for con- 
centration (see e.g. Lemma 4.1 in |Led01| ). 

Let / be a 1-Lipschitz function and A ^ 0. For any smooth function g 
and any real- valued random variable Y, a Taylor expansion gives Eg(Y) ^ 
g(EY) + I (sup g") Var Y, so that 

\2 X{Mf(x)+2a ao ) 

(MeV)(x) < e AA ^ + ^— 2 Var mi / 

Take A < l/Sa^ so that e 2W ^ 2. By definition, V&r mx f < a(x) 2 /n x 
hence 

(Ms«) (l ) < e«"<*> (l + ^ 2 ~~~) < ei m ° )+ ^ 

But since a(x) 2 /n x K is C-Lipschitz by assumption, and since besides 
Mf(x) is (1 — K)-Lipschitz by Proposition [28], the sum Mf(x) + A^^- is 
(1 — k + ACft)-Lipschitz. 

From now on we take A ^ 1/2C. We can repeat the argument, setting 
fi(x) := Mf(x) + A^^- and using that f\ is (1 — ft/2)-Lipschitz. This yields 

(M 2 e A/ )(x) ^ (Me A ^)(x) ^ e AM/l (*)+^(W2) 2 

Next, M/i is (l-«)(l-«/2)-Lipschitz, whereas A^(1-k/2) 2 is f(l- 

K/2) 2 -Lipschitz. So f 2 (x) := Mf x (x) + A^(l -k/2) 2 is (at least) (I-k/2) 2 - 
Lipschitz, hence 

(M 3 e A/ )(x) ^ (Me Xh )(x) sC e^W+^l 1 -/ 2 ) 4 
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By induction, we get that f k +i(x) := Mf k (x) + A £ g-(1 - n/2) 2k is (1 - 
K/2) fc+1 -Lipschitz and that {M k e Xf ){x) sC e A/fc ^. 

Now setting := and expanding yields 

k 

f k (x) = (M k f)(x) + A^(M fc "^)(x) (1 - k/2) 2 ^ 

8=1 

so that the limit of f k (x) when k — > oo is 

oo , 

y + AVE„ 3 (l- «/2) 2 ( i - 1 ) < E„/ + AE^ — 

1=1 

Meanwhile, (M fc e A/ )(x) tends to E^e A/ , so that 

We can conclude by a standard Chebyshev inequality argument. □ 

Remark 34 - The proof provides a similar concentration result for the 
finite-time measures \x* k as well, with variance 

D lk = E(i - «/2) 2(i - 1} (m*-^V*) 

i=l \ ^ / 

and the same expression for t max - 

REMARK 35 The condition that is uniformly bounded can be replaced 
with a Gaussian-type assumption, namely that for each measure m x there 
exists a number s x such that E ma .e A ^ ^ Q x2s l/ 2 e XE m x f f or ari y l-Lipschitz func- 
tion /. Then a similar theorem holds, with a(x) 2 replaced with s 2 . (When 
s 2 is constant this is Proposition 2.10 in |DGW04| .) However, this is gener- 
ally not well-suited to discrete settings, because when transition probabili- 
ties are small, the best s 2 for which such an inequality is satisfied is usually 
much larger than the actual variance a(x) 2 : for example, if two points x 
and y are at distance 1 and m x (y) = e, s x must satisfy e~ l / 2s * ^ e hence 
s 2 , ^ l/21n(l/£:) ^> e. Thus making this assumption will provide extremely 
poor estimates of the variance D 2 when some transition probabilities are 
small (e.g. for binomial distributions on the discrete cube); however, when 
this does not occur (e.g. for the uniform distribution on the discrete cube), 
this assumption allows to get rid of , and even get genuine Gaussian con- 
centration for all £ G 1R in the case C = 0. 
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3.3 Examples revisited 



Let us test the sharpness of these estimates in some examples, beginning with 
the simplest ones. In each case, we gather the relevant quantities in a table. 
Recall that ~ denotes an equality up to a multiplicative universal constant 
(typically ^ 4), while symbol ~ denotes usual asymptotic equivalence (with 
the correct constant). 



3.3.1 Riemannian manifolds 

First, let X be a ^-dimensional Riemannian manifold with positive Ricci 
curvature. Equip this manifold with the random walk at scale e > 0, as in 
Example d 

Let inf Ric denote the largest K > such that Ric(t>, v) ^ K for any unit 
tangent vector v. The the relevant quantities for this random walk are as 
follows (see Section [8] for the proofs). 



Ricci curvature 

Spread 

Dimension 

Variance (Levy-Gromov thm.) 
Gaussian variance (Thm. [32]) 
Gaussian range 



K ~ 2(if+2) illfRiC 

a(x) 2 ~ e 2 N 



n 



N 



N+2 



« 1/ inf Ric 
D 2 « 1/ inf Ric 



l/(e inf Ric) — > oo 



So, up to some (small) constants, we recover Gaussian concentration as 
in the Levy-Gromov theorem. 

The same applies to diffusions with a drift on a Riemannian manifold. To 
be consistent with the notation of Example [EH in the table above s has to 
be replaced with y/(N + 2)5t, and inf Ric with inf (Ric(u, v) - 2V sym F(w, v)) 
for v a unit tangent vector. (In the non-compact case, care has to be taken 
since the Brownian motion on the manifold may not exist, and even if it does 
its approximation at time 5t may not converge uniformly on the manifold. 
In explicit examples such as the Ornstein-Uhlenbeck process, however, this 
is not a problem.) 



3.3.2 Discrete cube 

Back to the discrete cube {0, 1}^ of Example EJ equipped with its graph 
distance (Hamming metric) and lazy random walk. 
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Ricci curvature 


K = 


l/N 


Spread 


a{x) 


2 « 1 Vx 


Dimension 


n m 


1 


Gaussian variance (Thm. [32]) 


D 2 f 


a N 


Actual variance 


N/4 





The following simple remark allows to actually compute the small numer- 
ical constants implied in the notation ~, and to check that Proposition [3H 
gives a sharp value when iV — > oo. 

Proposition 36 - Let m be the lazy simple random walk on a locally 
finite graph. Then, for any vertex x one has a(x) 2 /n x ^1/2. 

Applying this to the estimate of Proposition EH for the discrete cube, one 
gets a 2 /nn(2-K) 1/2k(2-k) which, for k = l/N, yields A/2 (2 - l/N) ~ 
N/4. (One can actually get exactly N/4 by using a continuous-time random 
walk instead.) 

PROOF - By definition a[x) 2 /n x is the maximal variance, under m x , of a 
1-Lipschitz function. So let / be a 1-Lipschitz function on the graph. Since 
variance is unvariant by adding a constant, we can assume that f(x) = 0. 
Then \f(y)\ ^ 1 for any neighbor y of x. Since m is the lazy simple random 
walk, we have m x (x) ^ 1/2 (with equality if there are no loops) and the 
mass, under m x , of all neighbors of x is at most 1/2. Hence V&r mx f = 
E m J 2 -(E m J) 2 ^E m J 2 ^l/2. 

This value is actually achieved when x has an even number of neighbors 
and when no two distinct neighbors of x are neighbors; in this case one can 
take f(x) — 0, / = 1 on half the neighbors of x and / = -1 on the remaining 
neighbors of x. □ 

3.3.3 Binomial distributions 

The occurrence of a finite range £ max for the Gaussian behavior of tails is 
genuine, as the following example shows. 

Let X = {0, 1}^ equipped with its Hamming metric (each edge is of 
length 1). Consider the following Markov chain on X: for some < p < I, 
at each step, choose a bit at random among the bits; if it is equal to 0, 
flip it to 1 with probability p; if it is equal to 1, flip it to with probability 
I — p. The binomial distribution u ((xi)) = — p) l ~ Xi is reversible for 

this Markov chain. The Ricci curvature of this Markov chain is l/N . 

Let k be the number of bits ofi 6 X which are equal to 1. Then k follows 
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a Markov chain on {0, 1, ... , N}, whose transition probabilities are: 



p k)k+ i=p(l- k/N) 
Pk ^x = (l-p)k/N 
p k , k =pk/N + (l-p)(l-k/N) 

The binomial distribution with parameters N and p, namely (^)p fc (l — 
p) N ~ k , is reversible for this Markov chain. Moreover, the Ricci curvature of 
this Markov chain is 1/N. 

Now, fix some A > and consider the case p = X/N. Let N — ► oo. It is 
well-known that the invariant distribution tends to the Poisson distribution 
e- A A fc /A;! on N. 

Let us see how Theorem [32] performs on this example. The table below 
applies either to the full space {0, 1}^, with k the function "number of l's", 
or to its projection on {0,1, . . . , N}. Note the use of Proposition 1241 to 
estimate a 2 , without having to resort to explicit knowledge of the invariant 
distribution. (All constants implied in the 0(1/ N) notation are small and 
completely explicit.) 



Ricci curvature 


K = 1/N 

a(k) 2 = (X + k)/N + 0(l/N 2 ) 


Spread 


Estimated Ek (Prop. 


Ek <: J(0)/k = A 


Actual Efc 


Ek = A 


Average spread 


a 2 = Ea(k) 2 = 2X/N + 0(1/N 2 ) 


Dimension 


n ^ 1 


Estimated variance (Prop. I3B 


a 2 /riK(2-K) = X + 0(1/N) 


Actual variance 


X 


Gaussian variance (Thm. l32l) 
Lipschitz constant of D 2 


D 2 = 2X + 0(1/N) 


C = 1 + 0(1/A) 


Gaussian range 


^max = 4A/3 



The Poisson distribution has a roughly Gaussian behavior (with variance 
A) in a range of size approximately A around the mean; further away, it 
decreases like e _fclnfc which is not Gaussian. This is in good accordance with 
the theorem, and shows that the Gaussian range cannot be extended. 

3.3.4 A continuous-time example: M/M/oc queues 

Here we show how to apply the theorem above to a continuous-time exam- 
ple, the M/M/oc queue. These queues were brought to my attention by 
D. Chafai. 
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The M/M/oo queue consists in an infinite number of "servers". Each 
server can be free (0) or busy (1). The state space consists in all sequences 
in {0, 1} N with a finite number of l's. The dynamics is at follows: Fix two 
numbers A > and fi > 0. At a rate A per unit of time, a client arrives 
and the first free server becomes busy. At a rate /i per unit of time, each 
busy server finishes its job (independently of the others) and becomes free. 
The number k G N of busy servers is a continuous-time Markov chain, whose 
transition probabilities at small times t are given by 

zW = At + 0(t 2 ) 
Pl,k-i = ^t + 0(t 2 ) 
p{ k = l-(X + kfi)t + 0(t 2 ) 

If we replace A with X/N and fi with 1/N, this Markov chain appears as 
the limit of the binomial example above. This is especially clear in the table 
below. 

This system is often presented as a discrete analogue of an Ornstein- 
Uhlenbeck process, since asymptotically the drift is linear towards the origin. 
However, it is not symmetric around the mean, and moreover the invariant 
(actually reversible) distribution v is a Poisson distribution (with parameter 
X/fi), rather than a Gaussian. 

In this continuous-time setting, the definition are adapted as follows: 
n(x,y) := — ■^Ti(m t x ,m t y )/d(x,y) (as mentioned in the introduction) and 
a(x) 2 := ~ JJ d(y, z) dm t x (y)dm t x (z), where m* is the law at time t of the 
process starting at x. It is immediate to check that the Ricci curvature of this 
process is \i. Proposition [3D (with a 2 /2nn instead of <r 2 /nn(2 — k) because 
both a 2 and k tend to for the discrete-time approximation) and Theorem [321 
still hold. 

The relevant quantities are as follows. 



Ricci curvature 


K — /I 




Spread 


a(k) 2 = kfjL + A 




Estimated Ek (Prop. [MI) 


Ek «C J(0)/k = 


X/fi 


Actual Ek 


Ek = X/fi 




Average spread 


a 2 = Ea(k) 2 = 


2X 


Dimension 


n ^ 1 




Estimated variance (Prop. IBTT) 


a 2 /2nn = X/fi 




Actual variance 


a/a* 




Gaussian variance (Thm. [32]) 
Lipschitz constant of D 2 


D 2 = IXj ii 




C = l 




Gaussian range 


^max = 4A/3/X 





34 



So once more Theorem [32] is in excellent accordance with the behavior 
of the random walk, whose invariant distribution is Poisson with mean and 
variance A//i. 

An advantage of this approach is that is can be generalized to situations 
where the rates of the servers are not constant, but, say, bounded between, 
say, /io/10 and 10/i . Indeed, the M/M/oo queue above can be seen as a 
Markov chain in the full configuration space of the servers, namely the space 
of all sequences over the alphabet {free, busy} containing a finite number 
of "busy". It is easy to check that the Ricci curvature is still equal to /i in 
this configuration space. Now let us consider the case of variable rates: in 
this situation, the number of busy servers is generally not Markovian, so one 
has to work in the configuration space. If the rate of the z-th server is 
the Ricci curvature is inf \Xi in the configuration space, whereas the spread is 
controlled by sup So if the rates vary in a bounded range, Ricci curvature 
still provides a Gaussian-type control, though an explicit description of the 
invariant distribution is not available. 

3.3.5 An example of exponential concentration 

We give here a very simple example of a Markov chain which has positive 
curvature but for which concentration is not Gaussian but exponential, due 
to large variations of the spread, resulting in a large value of C. An even 
simpler example, with exponential concentration due to unbounded o"oo(x), 
was given in the introduction (Example [T4l) . 

This is a continuous-time random walk on N defined as follows. Take 
a < (3 G R. For k e N, the transition rate from k to k + 1 is (A; + l)a, 
whereas the transition rate from k + 1 to k is (k + l)f3. It is immediate to 
check that the geometric distribution with decay a/ (3 is reversible for this 
Markov chain. 

The Ricci curvature of this Markov chain is easily seen to be (3 — a. 
We have cr(k) 2 = (k + l)a + k(3, so that cr(k) 2 is (a + /3)-Lipschitz and 
C = (a + (3)/((3-a). 

The expectation of k under the invariant distribution can be bounded by 
J(0)/k = a/(j3 — a) by Proposition 1241 which is actually the exact value. 
So the expression above for u(k) 2 yields a 2 = 2a(3j{(3 — a). Consequently, 
the estimated variance a 2 /2nn (obtained by the continuous-time version of 
Proposition [HI) is at most a(3/((3 — a) 2 , which is the actual value. 

Now consider the case when (3 — a is small. If we try to apply Theorem [32] 
without taking into account the variations of the spread (witnessed by the 
constant C), we get blatantly false results since the invariant distribution 
is not Gaussian at all. In the regime where (3 — a — > 0, the width of the 
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Gaussian window in Theorem [321 is D 2 /C ~ a/(/3 — a). This is fine, as this 
is the decay distance of the invariant distribution, and in this interval both 
the Gaussian and geometric estimates are close to 1 anyway. But if the C 
factor was not included, we would get D 2 /a^ = a(3/{(3 — a) 2 , which is much 
larger; the invariant distribution is clearly not Gaussian on this interval. 

Moreover, Theorem [321 predicts, in the exponential regime, a exp(— t/2C) 
behavior for concentration. Here the asymptotic behavior of the invariant 
distribution is (a/pf ~ (1 - 2/C)* ~ e~ 2t/c when (3 — a is small. So we see 
that (up to a constant 4) the exponential decay rate predicted by Theorem [321 
is genuine. 

3.3.6 Heavy tails 

It is clear that a variance control alone does not imply any concentration 
beyond the Bienayme-Chebyshev inequality. We now show that this is till the 
case even with the positive curvature assumption. Namely, in Theorem [32], 
neither the assumption that <r(x) 2 is Lipschitz, nor the assumption that 
is bounded, can be removed (but see Remark [351) . 

Heavy tails with non-Lipschitz a(x) 2 . Our next example shows that if 
the spread a(x) 2 is not Lipschitz, then non-exponential tails may occur in 
spite of positive curvature. 

Consider the continuous-time random walk on N defined as follows: the 
transition rate from k to k + 1 is a(k + l) 2 , whereas the transition rate from 
k to k — 1 is a(k + l) 2 + bk for k ^ 1. Here a, b > are fixed. 

We have k = b and c{k) 2 = 2a{k + l) 2 + bk, which is obviously not 
Lipschitz. 

This Markov chain has a reversible measure z/, which satisfies v{}z)lv{k — 
1) = ak 2 /(a(k + l) 2 + bk) = 1 - |(2 + J) + 0(1/A; 2 ). Consequently, asymp- 
totically u{k) behaves like 

k 

JJ (1 _ 1(2+ J)) « e -( 2+6 / a )^= 1 t w k~^ 2+b ^ 
i=i 

thus exhibiting heavy, non-exponential tails. 

This shows that the Lipschitz assumption for cr(x) 2 cannot be removed, 
even if in this case (Too is bounded by 1. It would seem reasonable to look for 
a systematic correspondance between the asymptotic behavior of a(x) 2 and 
the behavior of tails. 



36 



Heavy tails with unbounded a^. Consider the following random walk 
on N*: a number k goes to 1 with probability 1 — 1/4/c 2 and to 2k with 
probability l/4k 2 . One can check that k ^ 1/2. These probabilities are 
chosen so that a(k) 2 = (2k — l) 2 x 1/Ak 2 x (1 — l/4k 2 ) < 1, so that the variance 
of the invariant distribution is small. However, let us evaluate the probability 
that, starting at 1, the first i steps consist in doing a multiplication by 2, 
so that we end at 2 l ; this probability is E[j=o 4 (23) 2 = 4~ 1_l ^ _1 ^ 2 . Setting 
i = log 2 k, we see that the invariant distribution v satisfies 

v (k) ^ u ^ o-ios,fcgoR,fc-i) 

for k a power of 2. This is clearly not Gaussian or exponential, though a{k) 2 
is bounded. 



4 Local control and logarithmic Sobolev inequal- 
ity 

The estimates above (e.g. for the spectral gap) were global: we used that 
the averaging operator M transforms a 1-Lipschitz function into a (1 — k)- 
Lipschitz function. Now we turn to some form of control of the gradient 
of Mf at some point, in terms of the gradient of / at neighboring points. 
This is closer to classical Bakry-Emery theory, and allows to get a kind of 
logarithmic Sobolev inequality. 

Definition 37 - Choose A > and, for any function f : X — > R, define 
the A-range gradient of f by 

(£>/)(*):= sup - W)l e -M»,y)-M^) 

y,y'ex d{y, y') 

This is a kind of "mesoscopic" Lipschitz constant of / around x. Note 
that if / is a smooth function on a compact Riemannian manifold, when 
A — > oo this quantity tends to | V/(x)|. 

It is important to note that Df is 2A-log-Lipschitz. 

We will also need a control on negative curvature: In a Riemannian mani- 
fold, the Ricci curvature might be ^ s because there is a direction of curvature 
1 and a direction of curvature — 1 + s. The next definition captures these 
variations. 



Definition 38 (Unstability) Let 

1 

d(x,y) 



k + (x, y) := 1 / (d(x, y) - d(x + z,y + z)) + 
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and 



k-(x, y) := — r / (d(x, y) -d(x + z,y + z)). 

d{x,y) 



where a + and a_ are the positive and negative part ofaEM, so that k(x, y) = 
k + (x, y) — k_(x, y). (The integration over z is under a coupling realizing the 
value of k(x, y).) 

The unstability U (x, y) is de&ned as 

t\j ( X u ) 

U(x,y) := — -. — and U : = sup U(x,y) 

K\X,y) x,y£X,x^y 



Remark 39 If X is e-geodesic, then an upper bound for U(x,y) with 
d(x, y) ^ e implies the same upper bound for U . 

In most discrete examples given in the introduction (Examples El El, H21 
[TBI [ML unstability is actually 0, meaning that the coupling between m x 
and rriy never increases distances (this could be a possible definition of non- 
negative sectional curvature for Markov chains). In Riemannian manifolds, 
unstability is controlled by the largest negative sectional curvature, but this 
does not influence the final results since one can take arbitrarily small steps 
for the random walk. Interestingly, in Example fT7l (Glauber dynamics), un- 
stability depends on temperature. 

Due to the use of the gradient D, the theorem below is interesting only 
if a reasonable estimate for Df can be obtained depending on "local" data. 
This is not the case when / is not A-log-Lipschitz. This is consistent with the 
fact mentioned above, that Gaussian concentration of measure only occurs 
in a finite range, with exponential concentration afterwards, which implies 
that no true logarithmic Sobolev inequality can hold in general. 

Theorem 40 - Suppose that Ricci curvature is at least k > 0. Let A ^ 
24o- \i+u) an d cons id er the A-range gradient Df. Then for any function 
f : x —> M such that Df < oo, one has 

Var„/ < ( sup [(Dffdu 

and for positive f , 

„ , / Aa(x) 2 \ f (Df) 2 , 
Ent, / ^ sup / du 

V x Kn x J J j 

where v is the invariant distribution. 
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If moreover the random walk is reversible with respect to v, then 



and 



where 



Var u f^ J V{x)Df{xfdv{x) 

Ent„/< J V{x)^j^du{x) 
V(x) = 2^(1 - K /2) 2t M t+l (^-\ 



t=0 

The form involving V(x) is motivated by the fact that, for reversible dif- 
fusions in M N with non-constant diffusion coefficients, these coefficients natu- 
rally appear in the formulation of functional inequalities (see e.g. [AMTUOT] ). 
The quantity V(x) Df(x) 2 is to be thought of as a crude version of the Dirich- 
let form associated with the random walk. It would be more satisfying to 
obtain inequalities involving the latter (compare Corollary [301) , but I could 
not get a version of the commutation property DM ^ (1 — k/2)MD involving 
the Dirichlet form. 

Remark 41 - If is C-Lipschitz (as in Theorem E2J) , then V(x) < 
± f ^du(x) +2C^. 

Ki J Tlx fa 



Examples. Let us compare this theorem to classical results. 

In the case of a Riemannian manifold, for any smooth function / we can 
choose a random walk with small enough steps, so that A can be arbitrarily 
large and Df arbitrarily close to | V/|. Since moreover <r(x) 2 does not depend 
on x for the Brownian motion, this theorem allows to recover the logarithmic 
Sobolev inequality in the Bakry-Emery framework, with the correct constant 
up to a factor 4. 

Now consider the two-point space {0,1}, equipped with the measure 
v(Q) = 1 — p and z/(l) = p. This is a classical space on which modified log- 
arithmic Sobolev inequalities were introduced |BL98j . We endow this space 
with the Markov chain sending each point to the invariant distribution. Here 
we have a(x) 2 = p(l —p), n x = 1 and k — 1, so that we get the inequality 
Enty / ^ 4p(l — p) J ( D p du, identical to the known inequality |BL98| except 
for the factor 4. 

Tensorizing this result provides a modified logarithmic inequality for 
Bernoulli and Poisson measures |BL98| . If, instead, we directly apply the 
theorem above to the Bernoulli measure on {0, 1}^ or the Poisson measure 
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on N (see Sections 13.3.31 and I3.3.4B . we get slightly worse results. Indeed, 
consider the M/M/oo queue on N, which is the limit when N — > oo of the 
projection on N of the Markov chains on {0, 1}^ associated with Bernoulli 
measures. Keeping the notation of Section 13.3.41 we get, in the continuous- 
time version, a(x) 2 = x/i + A, which is not constant. So we have to use V(x); 
Remark I4T1 and the formulas in Section l3.3.4l yields V(x) ^ 8A//i+2(A+x/z)/// 
so that we get the inequality 

Ent„ f^-f ^j^- (10 + 2xfi/X) du(x) 
which is to be compared to the inequality 



obtained in [BL98J, with D + f(x) = f(x + 1) — f(x). So asymptotically our 
version is worse by a factor x. Note however that the Poisson measure satisfies 
xfi/\ dv(x) = du(x — 1), so one could say that our general, non-local notion 
of gradient fails to distinguish between a point and an immediate neighbor, 
and does not take advantage of the particular structure of a random walk on 
N. 

Proof. We now turn to the proof of Theorem SO, which is essentially a 
copy of the Bakry-Emery argument. The key property is Proposition [43J, 
a commutation property between the gradient and random walk operators 
stating that DM sC (1 - k/2)MD. 

Lemma 42 - Let A be a function on Suppm^, such that A(z) ^ e p A(z') 
for any z, z' £ Supp m x , with p ^ 2 (i+m • Then for any x,y £ X we have 

d(x + z,y + z) ^ f 
d{x,y) J z 

and in particular 

A(z)(d(x + z,y + z) - d(x, y)) < 



Proof - Set F — max 2 A(z). Then 

r A{z) d(x+z, y + z) = f A(z)+F rmf^+iii+A^i 

J z d(x, y) J z J z F \ d(x, y) / 
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and recall that, by definition, K_(x,y) = j z4{x+z ^ +z)>d{Xjy) (d(x + z,y + z)/d(x,y) - 1) 
and K+(x,y) = J zAx+Ziy+zKd(x , y) (1 - d(x + z, y + z)/d{x, y)). Using that 
A(z) ^ F on one hand and A(z) ^ e~ p F on the other hand, we get 

A ^ d(x + z,y + z) ^ r A{z)+F{K _^ y) _ e - PK+ ^ y)) 



d(x,y) 

Now, recall that by definition of U we have K_(x,y) ^ Un(x,y). It is 
not difficult to check that p ^ 2 (i+c/) * s enou gh to ensure that e~ p K + (x, y) — 
K^(x,y) ^ K,(x,y)/2, hence 

. , , d(x + z, y + z) f . , , „ , 
^0* ^ n < / A(s - F#c y /2 
2/) Jz 



*C y A(*)(l-«(x,y)/2) 

as needed. □ 

Proposition 43 - Suppose that the Ricci curvature is at least k > 0, 
and choose some A ^ 24o . Tjq^n • Tnen for any function f : X — > K we nave 

£>(M/)(x) ^ (l-«/2)M(D/)(x) 



Proof - For any y,y' <E X we have 

|M/(y)-MW)| e _ AW<Bfl0+d(!B>w / )) 

/■ p -A(d(a;,a)+<i(x,i/ / )) 

< / I /(</ + *) -/(</ + *) I - 



^ / Df(x + z) 
A{z)B{z 



d(y,y') 

d(y + z,y' + z) e -X(d(a>, y )+d(x, y ')) 



e -X(d(x+z,y+z)+d(x+z,y'+z)) My^ y'\ 

d(y + z,y' + z) 

d(y,y') 

where A(z) = Df(x + z) and B(z) = e Kd(x+z, y +z)-d(x,y)+d(x+z,y'+z)-d{x )y ')) _ 
For any 2 we have (1 — K,(x,y))d(x,y) — 4cToo ^ c/(x + + 2) ^ (1 — 
y))d(x, y) + 40-qo and likewise for y' , so that £? varies by a factor at most 
e 8A<Jo °. Likewise, since D/ is 2A-log-Lipschitz, A varies by a factor at most 
e 4ACT °°. So the quantity A(z)B(z) varies by at most e 12ACT °°. 
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So if A ^ 24o- JT+U) ' we can a PP^ Lemma H2] and get 

/^ w" , ;^ ) +2) <(i-«/2)/^ ) B W 

Now we have J^(«)B(z) = J g A(z) + J g A(z)(B(z)-l). Unwinding B{z) 
and using that e a — 1 ^ ae a for any o G 1, we get 

Ja{z){B{z)-1)^ 

A / (d(x + z,y + z) - d(x, y) + d(x + z,y' + z) - d(x, y')) 

J z 

which is non-positive by Lemma [12 Hence f A(z)B(z) ^ f A(z), which 
ends the proof. □ 

Let v be the invariant distribution. Let / be a positive function with 
f f du = 1. We know that 

Ent / = jf Mf(x) (Ent mx M*) + Ent Mf 



and similarly 



Var/ = ^ I Var^M'/d^x) 



Now for any y, z G Suppm x we have \f(y) — f(z)\ ^ Df(y)d(y, z)e Xd( - y ' z \ 
Since D/ is 2A-log-Lipschitz, we have Df(y) sC e 4AfToo M( J D i /)(x), so that 
- /(*)| ^ d{y,z)M(Df)(x)e 6Xa °°, i.e. / is M(Df) (x) e 6ACT ~-Lipschitz. 
Consequently 

Var „ / < wiMW 

and, using that a log a < a 2 - a, we get that Ent mx < Mfffi V ar m* f so 

p , J_ , 2{M{Df){x)fa{xf 
m *Mf(x)^ n x Mf{xY 

Thus 2 

Var/^2^ / ^-(M(DM*/)(z)) 2 di/(x) 
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and 



dv(x) 



n x M^f(x) 

By Proposition [431 we have (DM*/)(j/) < (1 - K/2) t M t (D f)(y), so that 

I ^^(M t+1 Df(x)) 2 (l- K/2) 2t dv(x) 



and 



Now since the norm of M acting on L 2 {y) is at most 1, we have 

Var/ s£ 2sup^^ ^(1 - k/2) 21 [ (M t+1 Df(x)) 2 dv{x) 
x n x m J x 

^ - sup f (Df(x)) 2 du(x) 

K x Tlx J x 



For the entropy of /, the Cauchy-Schwarz inequality yields 

(M^Df(x)) 2 = ( M' +1 ( ^-V7)(*)) 2 < M t+1 (^jr-) (x) M t+1 f(x) 
so that finally 

(x) (1 - k/2) 2 ' du{x) 



4 a{xf f (Df(x)) 2 , , , 
< - sup / v J ; dv{x) 



k x n x J x f[x 

5 Exponential concentration in non-negative cur- 
vature 

We have seen that positive Ricci curvature implies a kind of Gaussian concen- 
tration. We now show that non-negative Ricci curvature and the existence 
of an "attracting point" imply exponential concentration. 

The basic example to keep in mind is the following. Let N be the set of 
non-negative integers equipped with its standard distance. Let < p < 1 
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and let the nearest-neighbor random walk on N that goes to the left with 
probability p; explicitly m k = p5k-\ + (1 — p)$k+i for A; ^ 1, and m = 
p5 + (1 -p)5 1 . 

Since for k ^ 1 the transition kernel is translation-invariant, it is immedi- 
ate to check that n{k, k+1) = 0. Besides, k(0, 1) = p. There exists a invariant 
distribution if and only if p > 1/2, and it satisfies exponential concentration 
with characteristic decay distance l/log(jo/(l — p)). For p = 1/2 + e with 
small e this behaves like l/4e. 

Geometrically, what entails exponential concentration in this example is 
the fact that, for p > 1/2, the point "pulls" its neighbor, and the pulling is 
transmitted by non-negative Ricci curvature. We now formalize this situation 
in the following theorem. 

Theorem 44 - Let (X,d,(m x )) be a metric space with random walk. 
Suppose that for some o G X and r > one has: 

• k(x, y) ^ for all x, y G X, 

• for all x G X with r $C d(o, x) < 2r, one has Ti(m x , 5 ) < d(x, o), 

• X is r -geodesic, 

• There exists s > such that each measure m x satisfies the Gaussian- 
type Laplace transform inequality 



for any A > and any 1-Lipschitz function f : Supp m x — > R. 

Set p = inf{d(x, o) — Ti(m x , 5 Q ), r ^ d(o, x) < 2r} and assume p > 0. 

Then there exists a invariant distribution for the random walk. Moreover, 
setting D = s 2 /p and m = r + 2s 2 /p + p(l + J(o) 2 /4s 2 ), for any invariant 
distribution v we have 



So we get exponential concentration with caracteristic decay distance 





and so for any 1-Lipschitz function f : X — ► R and t ^ we have 



Pr (\f-f(o)\ ^t + m)^ (8 + 2J(o) 2 /s 2 )e- t/D 



s 2 /P- 
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Note that the last assumption is satisfied with s = 2a 00 thanks to Propo- 
sition 1.16 in jLedOlj . 

Before proceeding to the proof, let us show how this applies to the geo- 
metric distribution above on N. We take of course o = and r = 1. We can 
take s = 2<7oo = 2. Now there is only one point x with r ^ d(o,x) < 2r, 
which is x = 1. It satisfies mj = pS + (1 — p)S 2 , so that T\{m\, S ) = 2(1 — p), 
which is smaller than d(0, 1) = 1 if and only if p < 1/2 as was to be expected. 
So we can take p — 1 — 2(1 — p) = 2p — 1. We get exponential concentration 
with characteristic distance 4/ {2p — 1). When p is very close to 1 this is not 
so good (because the discretization is too coarse), but when p is close to 1/2 
this is within a factor 2 of the optimal value. 

Another example is the stochastic differential equation dX t = S dB t — 
Qip^l dt on IR n , for which exp(— |a;| a/S 2 ) is a reversible measure. Consider 
the Euler scheme at time St for this stochastic differential equation. Taking 
r = nS 2 /a yields that p ^ a St/2 after some simple computation. Since 
we have s 2 = S 2 St for Gaussian measures at time St, we get exponential 
concentration with characteristic decay distance 2S 2 /a, which is correct up 
to a factor 2. The additive constant in the deviation inequality is m = 
r + p(l + J(o) 2 /As 2 ) + 2s 2 /p which is equal to (n + 4)S 2 /a + 0(St) (note that 
J(o) 2 ~ s 2 ), which is the correct order of magnitude for the average distance 
to in dimension n. 

If « > in some large enough ball around o, then the invariant distri- 
bution is unique. However, this is not true in general: for example, start 
with the random walk on N above with a geometric invariant distribution; 
now consider the disjoint union NU (N+ \) where we keep the same random 
walk on N and the same walk translated by \ on N + ^: clearly there are 
two disjoint invariant distributions, however, curvature is non-negative and 
the assumptions of the theorem are satisfied with r = 1 and o = 0. 
Proof of the theorem 

Let us first prove a lemma which shows how non-negative curvature trans- 
mits the "pulling". 

Lemma 45 - Let x e X with d(x, o) ^ r. Then Ti(m x , o) < d{x, o) — p. 

Proof - If d(o, x) < 2r then this is one of the assumptions. So we suppose 
that d(o, x) ^ 2r. 

Since X is r-geodesic, let o = y , y\, y 2 , ■ ■ ■ , y n = x be a sequence of 
points with d{yi,yi + i) ^ r and J2^(yi,yi+i) = d(o,x). We can assume that 
d(o, 2/2) > r (otherwise, remove yi). Set z = y\ if d{p,y\) = r and z = y 2 if 
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d(o,yi) < r, so that r ^ d(o, z) < 2r. Now 

Tx(S ,m x ) ^ 71(5 ,m z ) + Ti(m z ,m :E ) 
^ <i(o, z) — p + x) 

since k(z, x) ^ 0. The conclusion follows from the fact that d(o, x) = d(o, z) + 
d(z,x). □ 

We are now ready to prove the theorem. The idea is to consider the 
function e Xd ( x '°) _ For points far away from the origin, since under the random 
walk the average distance to the origin decreases by p by the previous lemma, 
we expect the function to be multiplied by e~ Xp under the random walk 
operator. Close to the origin, the evolution of the function is controlled by 
the variance s 2 and the jump J(o) of the origin. Since the integral of the 
function is preserved by the random walk operator, and it is multiplied by a 
quantity < 1 far away, this shows that the weight of faraway points cannot 
be too large. 

More precisely, we need to tamper a little bit with what happens around 
the origin. Let cp : R + — > R+ be defined by <p(x) = if x < r; <p(x) = 
(x — r) 2 /kr if r ^ x < r(| + 1) and <p(x) = x — r — kr /4 if x ^ r(| + 1), 
for some k > to be chosen later. Note that cp is a 1-Lipschitz function and 
that <p" ^2/kr. 

If Y is any random variable with values in IR + , we have 

Etp(Y) <: <p(EY) + - Var Y sup </ ^ (p(¥Y) + ^- Var Y 

Now choose some A > and consider the function / : X — > R defined by 
f(x) = e x v( d (°> x )) m Note that ip(d(o,x)) is 1-Lipschitz, so that by the Laplace 
transform assumption we have 

Mf{x) ^ e A 2 s 2 /2 e AM^(o,x)) 

The Laplace transform assumption implies that the variance under m x of 
any 1-Lipschitz function is at most s 2 . So by the remark above, we have 

s 2 

M<p(d(o,x)) < ^(T^mJJ) + — 

kr 

so that finally 

Mf(x) ^ e x2 s 2 /2+\s 2 /kr e \T 1 ( mx ,8 ) 

So for any x with d(o, x) ^ r, we get 

Mf(x) ^ e x2 s 2 /2+Xs 2 /kr e Xip(d(x,o)-p) 
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If d(x, o) ^ r(| + 1) + p then (p(d(x, o) — p) = (p(d(x, o)) — p so that 

Mf(x) ^ e A2s2/2+As2/fcr " Ap /(x) 
If r ^ d(x, o) < r(| + 1) + p, then tp(d(x, o) — p) ^ p(d(x, o)) so that 

M/(z) ^ e A2s2/2+As2/fc 7(x) 

If, finally, <i(x, o) < r, then use non-negative curvature to write Ti{m x , 5 ) ^ 
Ti{m xi m ) + J(o) ^ d(x,o) + J(o) so that tp(Ti(m x ,5 )) ^ y?(r + J(o)) = 
J(o) 2 /kr and 

M/(x) ^ e A2s2/2+As2/fer+AJ(o)2/fer /(x) 

Let ^ be a probability measure such that J f du < oo. Let X' = {x £ 
X, d(x,o) < r(| + 1)} and X" = X \ X'. Set A(u) = f x ,fdv and = 
f x „ f du. We have shown that 



J f d(u * m) = J Mf du = J Mf dv + J Mf du 



JX 1 JX" 

so that 

A[y * m) + B{v * m) < ckA(i/) + /?.B(z/) 

With a = e >^s 2 /2+\s 2 /kr+\J(o) 2 /kr an( j o = Q \ 2 s 2 /2+\s 2 /kr~\p _ 

Choose A small enough and k large enough (see below) so that (3 < 1. 
Using that ^ e Afcr//4 for any measure z/, we get a^z/) + (3B(u) ^ (a — 
/?)e Afcr / 4 + /?(A(z/) + B(u)). In particular, if A(u) + ^ , we get 

aA(u) + (3B(u) ^ — • So setting i? = ^-fjig — , we have just shown 

that the set C of probability measures u such that f f du ^ R is invariant 
under the random walk. 

Moreover, if A(u) + B(u) > R then aA(u) + /35(z/) < A(z/) + B{u). 
Hence, if v is a invariant distribution, necessarily v £ C. This, together with 
an evaluation of i? given below, provides the bound for f f du stated in the 
theorem. 

We now turn to existence of a invariant distribution. First, C is obviously 
closed and convex. Moreover, C is tight: indeed if K is a compact, say 
included in a ball of radius a around o, then for any u <E C we have u(X\K) ^ 
Re~ Xa . So by Prokhorov's theorem, C is compact in the weak convergence 
topology. So C is compact convex in the topological vector space of all 
(signed) Borel measures on X, and is invariant by the random walk operator, 
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which is an affine map. By the Markov-Kakutani theorem (Theorem 1.3.3.1 
in |GD03| ). it has a fixed point. 

Let us finally evaluate R. We have 

_ a//3- 1 Xkr/4 
1/0-1 

„XJ{o) 2 /kr+Xp _ i 

e Afcr/4 



e Ap-As 2 /fcr-A 2 s 2 /2 _ ]_ 



<- P+J(o) 2 /kr \J(of/kr+\p+\kr/A 

^ p — s 2 /kr — Xs 2 /2 

using e a — 1 ^ ae a and e a — 1 ^ a. 

Now take X = p/s 2 and fc = 4s 2 /r p. This yields 

i? ^ (4 + J(o) 2 /s 2 ) e ^ 2 /P+P(i+./( )W)) 

Let be some invariant distribution. Since d(x,o) ^ <f(d(x, o) + r(l + 
fc/4)) we have / e Xd ^ dv ^ e Ar ( 1+fc / 4 ) ffdv ^ ife Ar ( 1+fc / 4 ) hence the result 
in the theorem. □ 



6 Ricci curvature and Gromov-Hausdorff topol- 
ogy 

We introduce here a Gromov-Hausdorff-like topology for metric spaces equipped 
with a random walk. Two spaces are close in this topology if they are close in 
the Gromov-Hausdorff topology and if moreover, the measures issuing from 
each point x are (uniformly) close in the L 1 transportation distance. More 
precisely: 

Definition 46 - Let (X, (m x ) xeX ) and (Y, {m y ) ye Y) be two metric spaces 
equipped with a random walk. For e > 0, we say that these spaces are 
e-close if there exists a metric space Z and two isometric embeddings fx '■ 
X <^-> Z , f y : Y Z such that the Hausdorff distance between fx(X) and 
fy(Y) is at most e, and, moreover, for any x G X, there exists y E Y such 
that dz(fx( x ), fyiv)) ^ e and the L 1 transportation distance between the 
pushforward measures fx(nT>x) and fyirriy) is at most 2e, and likewise for 
any y EY. 

The Ricci curvature is a continuous function in this topology. Namely, a 
limit of spaces with Ricci curvature at least k has Ricci curvature at least k. 

Below, we will relax the definition of Ricci curvature so as to allow any 
variation at small scale; withthis perturbed definition, having Ricci curvature 
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greater than n will become an open property. In particular, any space close 
to a space with positive Ricci curvature will have positive Ricci curvature in 
this perturbed sense. 

Proposition 47 - Let (X n , {rrt x ) x£X «) be a sequence of metric spaces 
with random walk, converging to a metric space with random walk (X, {m x ) xe x) ■ 
Let x, y be two distinct points in X and let (x N , y N ) G (X N , Y N ) be a se- 
quence of pairs of points converging to (x, y). Then k(x n , y N ) — > k(x, y). 

In particular, if all spaces X N have Ricci curvature at least k, then so 
does X . 

In order for positive curvature to be an open property in some topology 
a la Gromov-Hausdorff, one needs a rougher behavior at small scales. This 
is achieved as follows. 

Definition 48 - Let (X, d) be a metric space equipped with a random 
walk m. Let 5^0. The Ricci curvature up to 5 along x, y G X is 

n\x,y):=l- {Tlim Z my \~ 6)+ 

d(x,y) 

i.e. it is the largest k ^ 1 for which one has 

Ti(m x , m y ) < (1 - K)d(x, y) + 5 



With this definition, the following is easy. 

Proposition 49 - Let (X, (m x )) be a metric space with random walk 
with Ricci curvature at least k up to 5 ^ 0. Let 5' > 0. Then there exists a 
neighborhood Vx of X such that any space Y G Vx has Ricci curvature at 
least k up to 5 + 5' . 

Consequently, the property "having curvature at least k for some 5 ^ 0" 
is open. 

7 L 2 Bonnet-Myers theorems 

As seen in Section 12.31 it is generally not possible to give a bound for the 
diameter of a positively curved space involving the square root of curva- 
ture, because of such simple counterexamples as the discrete cube. Here we 
describe additional conditions which provide such a bound in two different 
types of situation. 
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We first give a bound similar to the Bonnet-Myers one, but on the average 
distance between two points rather than the diameter; it holds when there 
is an "attractive point" and is relevant for examples such as the Ornstein- 
Uhlenbeck process (Example [9]) or its discrete analogue (Example HO)) . 

Next, we give a direct generalization of the genuine Bonnet-Myers theo- 
rem for Riemannian manifolds. Actually, the only example where a Bonnet- 
Myers theorem holds seems to be the ordinary Brownian motion on a Rieman- 
nian manifold. Despite this lack of further examples, we found it interesting 
to provide an axiomatization of the Bonnet-Myers theorem in our language. 
This is done by reinforcing the positive curvature assumption, which com- 
pares the transportation distance between the measures issuing from two 
points x and y at a given time, by requiring a transportation distance in- 
equality between the measures issuing from two given points at different 
times. 



7.1 Average L 2 Bonnet-Myers 

We now describe a Bonnet-Myers-like estimate on the average distance be- 
tween two points, provided there is some "attractive point". This is rather 
similar to Theorem [44] in non-negative curvature. 

Proposition 50 (Average L 2 Bonnet-Myers) Let (X,d,(m x )) 
be a metric space with random walk, with Ricci curvature at least k > 0. 
Suppose that for some o G X and r ^ 0, one has 

d(o,y) dm x (y) < d(o,x) 

for any x G X with r ^ d(o, x) < 2r, and that moreover X is r-geodesic. 
Then 

J d(o,x) dv(x) ^ J dz/(x) + 5r 

where as usual v is the invariant distribution. 

Note that the assumption f d(o,y) dm x (y) ^ d(o,x) cannot hold for x in 
some ball around o unless o is a fixed point. This is why the assumption is 
restricted to an annulus. 

As in the Gaussian concentration theorem (Theorem 1321). in case cr(x) 2 is 
Lipschitz, Corollary [22] may provide a useful bound on J dv{x) in terms 
of its value at some point. 

As a first example, consider the discrete Ornstein-Uhlenbeck process of 
Example QUI which is the Markov chain on {—A, . . . , N} given by the tran- 
sition probabilities pu,k = 1/2, Pk,k+i = 1/4 — k/4N andp^-i = l/A + k/4N; 
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the Ricci curvature is n = 1/2N, and the invariant distribution is the bino- 
mial (j^J • This example is interesting because the diameter is 2N (as is the 
bound provided by Proposition [23| , whereas the average distance between 
two points is ~ vN. It is immediate to check is attractive, namely that 
o = and r = 1 fulfill the assumptions. Since a(x) 2 ~ 1 and k ~ l/N, 
the proposition recovers the correct order of magnitude for distance to the 
origin. 

Our next example is the Ornstein-Uhlenbeck process dX t = —a X t dt + 
s dB t on M. N (Example^]). Here it is clear that is attractive in some sense, so 
o = is a natural choice. The invariant distribution is a Gaussian of variance 
s 2 /a; under this distribution the average distance to is ~ Ns 2 /a. 

At small time r, a point x G WL N is sent to a Gaussian centered at (1 — 
ar)x, of variance ts 2 . The average quadratic distance to the origin under 
this Gaussian is (1 — ar) 2 d(0, x) 2 + Ns 2 r + o(r) by a simple computation. 
If d(0,x) 2 > Ns 2 /2a this is less than d(0,x) 2 , so that we can take r = 
\J Ns 2 /2a. Considering the random walk discretized at time r we have we 
have Kj ~ ar, a(x) 2 ~ Ns 2 r and n x w N. So in the proposition above, the 
first term is ~ ^s 2 /a, whereas the second term is 5r w a/ Ns 2 /a, which is 
thus dominant. So the proposition gives the correct order of magnitude; in 
this precise case, the first term in the proposition reflects concentration of 
measure (which is dimension-independent for Gaussians), whereas it is the 
second term 5r which carries the correct dependency on dimension for the 
average distance to the origin. 

Proof - Let ip : M. — > E be the function defined by y?(x) = if x ^ 2r, and 
<p(x) — (x — 2r) 2 otherwise. Note that for any real-valued random variable 
Y, we have 

Ey?(y) ^ p(EY) + - Var Y sup <p" = <p(EY) + Var Y 

Now let / : X — > M be defined by f(x) = ip(d(o,x)). We are going to 
show that 

Mf(x) K (1 - K?f(x) + + 9r 2 

n x 

forallxGX. Since J f du = J Mf du, we will get / f du ^ (1-k) 2 J fdu + 
J < -^ L - dv + 9r 2 which easily implies the result. 

First, suppose that r ^ d(o, x) < 2r. We have f(x) = 0. Now J d(o, y) dm x (y) 
is at most d(o, y) by assumption. Using the bound above for <p, together with 
the definition of a(x) 2 and n x , we get 



Mf{x) = / ip(d(o,y))dm x (y) ^ if I d(o,y)dm x {y) 



a(x) 2 a(x) 2 



Tl X Tin 
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since J d(o,y) dm x (y) ^ 2r by assumption. 

Second, suppose that d(x,o) ^ 2r. Using that X is r-geodesic, we can 
find a point x' such that d(o, x) = d(o, x') + d(x', x) and r ^ d(o, x') < 2r 
(take the second point in a sequence joining o to x). Now we have 

d(o,y) dm x (y) = Ti(S ,m x ) 

^ %_ (S Q , m x >) + T x (m x <, m x ) 
^ 7[(5 ,77v) + (1 - K)d(x',x) 

^ d(o, x') + (1 — K)d(x', x) ^ (1 — K)d(o, x) + 2«r 
and as above, this implies 

Mf(x)^y(J d(o,y)dm x (y)\ + ^- 

a(x) 2 

si ((1 - «)d(o, x) + 2«r - 2r) 2 + 

= (1-K) 2 if(d(0,x)) + ^- 

n x 

as needed. 

The last case to consider is d(o,x) < r. In this case we have 



d(o,y)dm x (y) = Ti(S ,m x ) 

< Ti(5 ,m ) +T 1 (m ,m x ) = J (6) +T 1 (m ,m x ) 
^ J(o) + (1 - «)d(o, x) < J(o) + r 

So we need to bound J(o). If X is included in the ball of radius r around 
o, the result trivially holds, so that we can assume that there exists a point 
x with d(o,x) ^ r. Since X is r-geodesic we can assume that d(o,x) < 2r 
as well. Now J (6) = Ti(m , S a ) ^ Ti(m , m x ) + T{(m x , S a ) ^ (1 — K)d(o, x) + 
Ti(m x , S ) ^ (1 — n)d(o, x) + d(o, x) by assumption, so that J(o) ^ 4r. 

Plugging this into the above, for d(o, x) < r we get J d(o, y) dm x (y) ^ 5r 

so that <f (J d(o,y) dm x (y)) < 9r 2 hence Af/(x) < 9r 2 + 
Combining the results, we get that whatever x G X 

Mf(x) ^ (1 - nffix) + ^2 + 9r 2 

n x 

as needed. □ 
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7.2 Strong Bonnet-Myers 

As mentioned above, positive Ricci curvature alone does not imply a l/\/R- 
like diameter control, because of such simple counter-examples as the discrete 
cube or the Ornstein-Uhlenbeck process. We now extract a property satisfied 
by the ordinary Brownian motion on Riemannian manifolds (without drift), 
which guarantees a genuine Bonnet-Myers theorem. Of course, this is of 
limited interest since the only available example is Riemannian manifolds, but 
nevertheless we found it interesting to find a sufficient condition expressed 
in our present language. 

Our definition of Ricci curvature controls the transportation distance be- 
tween the measures issuing from two points x and x' at a given time t. The 
condition we will now use controls the transportation distance between the 
measures issuing from two points at two different times. It is based on what 
holds for Gaussian measures in WL N . For any x,x' G M N and t,t' > 0, let 
m** and m** be the laws of the standard Brownian motion issuing from x at 
time t and from x' at time t' , respectively. It is easy to check that the L 2 
transportation distance between these two measures is 



The important feature here is that, when t' tends to t, the second term 
is of second order in t' — t. This is no more the case if we add a drift term 
to the diffusion. 

We now take this inequality as an assumption and use it to mimick the 
traditional proof of the Bonnet-Myers theorem. Here, for simplicity of nota- 
tion we suppose that we are given a continuous-time Markov chain; however, 
the proof uses only a finite number of different values of t, so that discretiza- 
tion is possible (this is important in Riemannian manifolds, because the heat 
kernel is positive on the whole manifold at any positive time, and there is 
no simple control on it far away from the initial point; taking a discrete 
approximation with bounded steps solves this problem). 

Proposition 51 (Strong L 2 Bonnet-Myers) Let X be a metric 
space equipped with a continuous-time random walk m**. Assume that X is 
e-geodesic, and that there exists constants k > 0, C ^ such that for any 
two small enough t, t' , for any x, x' G X with e ^ d(x, x') ^ 2e one has 



T 2 (m*J, m^) 2 = d(x, x') 2 + N(y/i - Vf) 2 



hence 



71(m**,m*f) ^ d(x,x') + 



N(^-V¥) 2 

2d(x,x') 



^ t ">d{x,x') + 



c(Vt-V¥) 2 

2d(x,x') 
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with k > 0. Assume moreover that e ^ \^JC /2k. 
Then 

4e 




When t = t', the assumption reduces to 7i(m**, m*J) ^ e~ Kt d(x, x'), which 
is just the continuous-time version of the positive curvature assumption. The 
constant C plays the role of a diffusion constant, and is equal to N for (a 
discrete approximation of) Brownian motion on a Riemannian manifold. We 
restrict the assumption to d(x, x') ^ e to avoid divergence problems for 



when x' — > x. 



2d(x,x') 

For the Brownian motion on an iV-dimensional Riemannian manifold, we 
can take k — \ inf Ric by Bakry-Emery theory (the \ is due to the fact that 
the infinitesimal generator of Brownian motion is |A), and C = N as in 
M. N . So we get the usual Bonnet-Myers theorem, up to a factor \/N instead 
of ^/N — 1 (similarly to our spectral gap estimate in comparison with the 
Lichnerowicz theorem), but with the correct constant 7r. 
Proof - Let x,x' e X. Since X is e-geodesic, we can find a sequence x = 
Xq,xi,..., Xk-i,Xk = x' of points in X with d(x iy x i+ i) ^ e and d(x iy x i+ i) = 
d(x ,Xk). By taking a subsequence (denoted again), we can assume that 
e ^ d(xi,x i+ i) ^ 2e instead. 

Set U — Tj sin ( ^ j for some (small) value of t] to be chosen later. 
Now, since t = % = we have 



d(x,x') = Tx(5 x ,6 x f) ^Tx(m* 



^ 2d{xi,x i+1 ) 

by assumption. Now, for a < b we have sin b — sin a = 2 sin ^ cos ^ ^ 
(b — a) cos ^ so that 

C{y/t^ - y/Tj) 2 Crj7T 2 d(Xi,X i+1 ) 2 / d(x, Xj) + d(a;, 
$ ^-77 TTTi COS 7T- 



2d(xi,Xi + i) 2d(x,x') 2 \ 2d(x,x') 

Besides, if r\ is small enough, one has e~ Kmi ^ ti ' ti+1 ^ = 1 — Kmi(ti,t i+1/ 
0(r] 2 ). So we get 

d(x,x') < y]d(xi,x i+1 ) - Kmf(ti,t i+ i)d(xi,x i+ i) 

Cr]7r 2 d(xi,x i+1 ) 2 f d(x, x { ) + d(x, x i+1 )\ , 2 . 

COS 71 — + 0(7] ) 



2d(x,x') 2 V 2d(x,x') 
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Now the terms J2d(xi,Xi+i) cos2 [ n ^ £ 2^^T ± ^'j and S in f(^ ; U+i)d(xi, x i+1 ) 

are close to the integrals d(x,x') f Q cos 2 (nu) du and d(x,x')rj sin 2 (iru) du 
respectively; the relative error in the Riemann sum is easily bounded by 
7ie/d(x, x') so that 

d(x, x 1 ) ^ d(x, x') — Krjd(x, x') ( — 



2 d(x ■ x 

Cr]TT 2 u A (\ TIE 

2d(x, x' 



hence, taking r\ small enough, 

u A2 ^ 1 + 2ne/d(x,x') 

d(x,x Y < — r 

v ; 2k 1 - 27T£/d(x,x') 

so that either d(x, x') < tv v / C/2k, or 2vr£:/d(a;, x') ^ 2ite / n yj C / 2k < 1/2 by 
the assumption that e is small, in which case we use (1 + a) /(I — a) ^ 1 + 4a 
for a ^ 1/2, hence the conclusion. □ 



8 Transportation distance in Riemannian man- 
ifolds 

Here we give the proofs of Proposition [6] and of the statements of Example [7] 
and Section 13.3.11 

We begin with Proposition [6] and evaluation of the Ricci curvature of the 
random walk at scale e. 

Let X be a smooth ^-dimensional Riemannian manifold and let x G X. 
Let v, w be unit tangent vectors at x. Let 5, e > small enough. Let y = 
exp x (8v). Let x' = exp x (ew) and y' = exp y (ew') where w' is the tangent vec- 
tor at y obtained by parallel transport of w along the geodesic t ^ exp x (tv). 

The first claim is that d(x', y') = 5 (l - ^-K(v, w) + 0(5e 2 + e 3 ) ) . 

We suppose for simplicity that w and w' are orthogonal to v. 

We will work in cylindrical coordinates along the geodesic t 1— > exp x (tv). 
Let v t = ^exp x (tv) be the speed of this geodesic. Let E t be the orthogonal 
of v t in the tangent space at exp x (tv). Each point z in some neighborhood 
of x can be uniquely written as exp exp t T ( z \ v )(£C( z )) f° r some t(z) G M and 
C(z)e E T{z) . 

Consider the function / equal to the distance of a point to exp x (_E ) 
(taken in some small enough neighborhood of x), equipped with a — sign if 
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the point is not on the same side of E as y. Clearly / is 1-Lipschitz, so that 
d(x',y')^f(y')-f(x'). 

The distance from exp x (E ) to y' is realized by some geodesic 7 starting 
at some point of exp x (E ) and ending at y. If 5 and e are small enough, this 
geodesic is arbitrarily close to the Euclidean situation so that the coordinate 
t is strictly increasing along 7. Let us parametrize 7 using the coordinate r, 
so that r(7(t)) = t. Let also w t = C(l(t)) Et- In particular, ws = w' . 

Now by definition we have j(t) = exp exp ^(ewt). Considering the family 
of geodesies s 1— > exPex.p x (tv)( sw t) and applying the Jacobi equation yields 



d 7 (t) 2 



dt 



\v t \ 2 + 2e(v t , w t ) + e 2 \w t \ 2 - e 2 (R(w t , v t )w t , v t ) + 0(e 3 ) 



where w t = ^w t . But since by definition w t E E t , we have (v t , w t ) = 0. Since 
moreover \v t \ — 1 we get 



dt 



.2 ji 



d^iit) 



1 + J \w t \ ~ j(R(wt, vt)w u v t ) + 0(e 6 ) 



which is always greater than 1 — y (R(wt, v t )w t , v t ) +0(e ). Integrating from 
t — to t = 8 and using that (R(wt,v t )w t ,Vt) = K(w,v) + 0(5) yields that 
the length of the geodesic is 

K(v, w) + 0(e 3 ) + 0(e 2 5) 

so that the distance from x' to y' is at least this quantity. But this value is 
achieved for w t = 0, in which case 7(0) = x' by definition, so this is exactly 
d(x',y'). This proves Proposition El 

Let us now prove the statement of Example d Let fj,Q, /ii be the uniform 
probability measures on the balls of radius e centered at x and y respectively. 
We have to prove that 

( £ 2 

liiVoiVi) = d{x,y) 11 - 2( , jV + 2 ^ ) Ric{v,v] 

up to higher-order terms. 

Let /Xq, \Jy be the images under the exponential map, of the uniform prob- 
ability measures on the balls of radius e in the tangent spaces at x and y' 
respectively. So fi' is a measure having density 1 + 0(e 2 ) w.r.t. /x , and 
likewise for fi[. 
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If we average Proposition [6] over w in the ball of radius e in the tangent 
space at x, we get that 



up to higher-order terms, since the coupling by parallel transport realizes 
this value. Indeed, Ric(v,v) is the sum of K(v,w) for w in an orthonormal 
basis of the tangent space at x. Consequently, the average of K(v, w) on the 
unit sphere is -^Ric(v,v). Averaging on the ball instead of the sphere yields 
an factor instead. 

Now the density of fi' Ql fi[ with respect to /i , /ii is 1 + 0(e 2 ). Moreover 
the 0(e 2 ) terms decompose as the sum of an 0(d(x, y)e 2 ) term and an 0(e 2 ) 
term which is the same for /i' and fi[ (indeed, // and fi[ coincide when x — y). 
Plugging this in the estimate above, we get the inequality for Ti(fi , /ii) up 
to higher-order terms. 

The converse inequality is proven as follows: if / is any 1-Lipschitz func- 
tion, the L 1 transportation distance between measures jiQ and /ii is at least 
the difference of the integrals of / under fi and fii (and actually, a clever 
choice of / realizes this transportation distance, see Theorem 1.14 in |Vil03| ). 
Arguments similar to the above for integrating under /i and /ii, applied to 
the function / above equal to the distance of a point to the set exp x (E ), 
yield the desired inequality. 

Finally, let us briefly sketch the proofs of the other statements of Sec- 
tion 13.3.11 namely, evaluation of the spread and local dimension (Defini- 
tion [H]). Up to a multiplicative factor 0(1 + s), these can be computed in 
the Euclidean space. 

A simple computation shows that the expectation of the square distance 
of two points taken at random in a ball of radius e is e 2 jfxjj, hence the value 
e 2 jjq-^ for the spread. 

To evaluate the local dimension (Definition [T8j) , we have to bound the 
maximal variance of a 1-Lipschitz function on a ball of radius e. We will 
prove that the local dimension n x is comprised between A — 1 and N . A 

2 

projection to a coordinate axis provides a function with variance so that 
local dimension is at most N . For the other bound, let / be a 1-Lipschitz 
function on the ball and let us compute an upper bound for its variance. Take 
e = 1 for simplicity. Write the ball of radius 1 as the union of the spheres S r 
of radii r ^ 1. Let v(r) be the variance of / restricted to the sphere S r , and 
let a(r) be the average of / on S r . Then associativity of variances gives 




Var/ 
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where fx is the measure on the interval [0; 1] given by dr with Z = 

lLor N - 1 dr = j, : 

Since the variance of a 1-Lipschitz function on the (N — l)-dimensional 
unit sphere is at most jj, we have v{r) ^ ^ so that J r=0 v(r) dfx(r) ^ 
To evaluate the second term, note that a(r) is again 1-Lipschitz as a function 
of r, so that Var^ a(r) — ~ Jf (a(r) — a{r')) 2 dfx{r)dfx{r') is at most | JJ{r — 
r') 2 dfx(r)dfx(r') = {N+1) 2 (N+2) ■ So finally 

Var/ ^ 



so that the local dimension n x is bounded below by ^n^Ti ^ N — 1. 



N + 2 (A^ + l) 2 (A^ + 2) 

N{1 

N 2 +3N+1 
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